#musicbrainz

/

      • chris8 has quit
      • thuna` has quit
      • slydacyfa has quit
      • JadedBlueEyes has quit
      • Maxr1998_ joined the channel
      • Maxr1998 has quit
      • Island_ has quit
      • Island_ joined the channel
      • Island_ has quit
      • Island_ joined the channel
      • v6lur has quit
      • minimal has quit
      • moviuro has quit
      • mara42 joined the channel
      • mara42 has quit
      • minimal joined the channel
      • moviuro joined the channel
      • Cheezmo_ has quit
      • Cheezmo_ joined the channel
      • slydacyfa joined the channel
      • minimal has quit
      • Techman
        I was looking into ListenBrainz as I was considering switching from Last.fm. Then I took a look at what a database dump looked like out of curiosity...I am very surprised that the listen data is not anonymized. Do not feel great about that at all, really.
      • Perhaps I had a false impression, but I had a belief that the actual tracks listened to would not be associated with individual users, at least in a public dump. I do think that releasing a listens database could be useful for statistical analysis on recordings but having users attached to that is not really necessary.
      • >By signing into ListenBrainz, you grant the MetaBrainz Foundation permission to include your listening history in data dumps we make publicly available under the CC0 license. None of your private information from your user profile will be included in these data dumps.
      • At least for me, I did not get the impression that this public dump of listening data would have my profile attached to it. This is because, you know, often times data that is made available for public research is anonymized so that said data cannot be traced back to users. At least easily.
      • In this case, I am not sure what the value is for including user information in the dump aside from spooking users and onlookers.
      • crism joined the channel
      • binzy joined the channel
      • aerozol[m] joined the channel
      • aerozol[m]
        Techman: What do you mean by anonymized? ListenBrainz is an open database. And without attaching/grouping the listens to some sort of user entity the data is useless.
      • Basically, if you don’t want your listens linked together, I would not recommend using ListenBrainz or last.fm. That’s a decision a lot of MetaBrainz contributors make - we are generally pretyt privacy conscious
      • If you have ideas for something along the lines of replacing explicit user names (or something else?) with random strings, you could open a ticket for that. But AFAIK it would be easy to resolve it to a user on the site, with the same stats. But take my input with a grain of salt, I am a layperson when it comes to the data dumps :)
      • Techman
        aerozol[m]: anonymized as in the data was generated by users but the link between the user and the data is not in the output.
      • It is not fool proof as this is a public site but do you kinda get what I mean?
      • Why would the data be useless if it could not be grouped to a user? I feel like you could still gain useful insights without a link. Clients used, popularity of songs, etc.
      • I do not mind my listening data being available for research but I would not want my username and ID to be linked to it in a public dump of all data. I feel like people should be using a profile page if the care about listens from a specific person. The way it is now, it feels creepy.
      • aerozol[m]
        How would you make the data meaningful, for instance be able to say that x individual users have listened to an artist, without linking all of a users listens together?
      • Or generate recommendations?
      • You could calculate total listens of a song or artist, sure. But we'd have to remove user profile pages
      • rbatty joined the channel
      • Techman
        The way the data is now, I can build a profile for every person on the site whether they know it or not, without requiring me to do anything. I am sure that I am not the only one who may not have a firm grasp on what is going on here.
      • aerozol[m]
        Building a profile for every person on the site is the point of ListenBrainz. You are making a profile of all of your listens on an open source site. It's what last.fm does as well
      • (they're not open source, but they have an API and anyone can grab your data)
      • Techman
        I am not a data expert when it comes to anonymization so I will defer for a real solution but I am sure there is a way to have some uniqueness in the data without easily tracing it back to a particular person
      • aerozol[m]
        I guess the general idea is that if you don't attach identifiable information to your username you are 'anonymous' in terms of linking your account to your team life person
      • *real life person
      • Techman
        I am treating having someone's username (e.g. looking at someone's profile page) different from public data dumps. If someone looks at my Last.fm page, then I would expect them to know my specific history. However I would not expect it to be in a public dump for research as there should be no need for my account to be identifiable in that.
      • aerozol[m]
        I don't really understand what you mean - public dump, public page, what's the difference? FYI I can plug your last.fm username into lots of places to scrape out interesting data
      • I guess I'm not really disagreeing with your sentiment... Just that I don't see how ListenBrainz (even more so than other sites, given we are open source) can work around it
      • Maybe someone else can think of a middle ground, but I can't 😔
      • rbatty has quit
      • If there's anywhere you think we can clarify the language re. What will be public, that's something I can make a ticket for btw
      • Techman
        IRC is perhaps not the best place to articulate thoughts but maybe I can try to make what I am thinking as clear as possible. Or make a forum post.
      • aerozol[m]
        Forum is always good for more discussion and input 👍
      • Then tickets if something actionable comes out of it!
      • Techman
        I guess I will take it from the top. I was originally going to migrate to ListenBrainz as I generally like open source stuff and I think Last.fm charging for reports is kind of bogus, but then I stopped because I checked out the database dumps and realized that identifiable info for a user is in the dump. I do not think that public data dumps should be traceable to users, at least directly.
      • For research purposes (often what this kind of data is made available for), there is no need to really link data to user accounts. There should be a way to anonymize the users in the output. I consider this different from visiting someone's profile page because the intent is different. Listening data for one particular user is very specific vs the public data dump which currently includes
      • everyone, fully traceable.
      • If the data were to be anonymized, then the people who grab the data dump can do research while not being able to trace it back to individuals unless they then looked up a person to connect it to that output.
      • The way the dump is now, everyone who has ever contributed a listen is exposed...even if they would rather only be found organically through other users. As someone looking at the data, I do not think I should have everyone's identifiable listening history. It feels creepy and unnecessary. Anonymous user IDs or some generated substitute could take the place of the usernames and user IDs and
      • the data could be pretty much as useful. If I wanted to know a specific person's history, I can always look them up.
      • Island_ has quit
      • binzy has quit
      • binzy joined the channel
      • binzy has quit
      • binzy joined the channel
      • ApeKattQuest
        oh! I think I get where aerozol's and Techman's misscommunication is
      • basically aerozol thought that users data will be removed completely, but the idea was to just put in something that's not an user name
      • (a sentimenti kinda get tby)
      • tbh*
      • outsidecontext has quit
      • outsidecontext joined the channel
      • G0d joined the channel
      • aerozol[m]
        Not quite, I just think that replacing a username with a random number or something is an option, but doesn’t anonymise the data at all. It might just give users a false impression and be even worse, tbh. But we could, I guess. Could just remove usernames and replace them with random strings, like MBID’s (but I don’t see how this is any more anonymous, since listens are still ‘grouped’(
      • theracermaster has quit
      • SigHunter has quit
      • SigHunter joined the channel
      • rbatty joined the channel
      • ApeKattQuest
        aerozol[m]: I think it means that someone wants to find an user's suername they'd have to mak like an effort for it, rather thna jsut having it thre, maybe?
      • idk
      • i don't super care, but i cna also kidna see the point too
      • slydacyfa has quit
      • zer0bitz has quit
      • zer0bitz joined the channel
      • trolley has quit
      • trolley joined the channel
      • trolley has quit
      • binzy has quit
      • trolley joined the channel
      • aerozol[m] has quit
      • chris8 joined the channel
      • MeatPupp3t has quit
      • MeatPupp3t joined the channel
      • SigHunter has quit
      • atj
        providing a false sense of anonymity is worse than not providing it at all
      • SigHunter joined the channel
      • Techman
        If the listen data is disassociated from usernames in the public dump, the only way to trace it back to someone would be to...look up that specific user's history.
      • nobiz joined the channel
      • Perhaps a better way to describe what I am saying is pseudoanonymization compared to strict anonymization.
      • nobiz has quit
      • nobiz joined the channel
      • nobiz has quit
      • nobiz joined the channel
      • nobiz has quit
      • nobiz joined the channel
      • nobiz has quit
      • nobiz joined the channel
      • nobiz has quit
      • nobiz joined the channel
      • ApeKattQuest
        I mean I think that's fine, as long as it's spesified that you are not atually *anonymised* in as "if someone tries they can lookup just about any user by comparing listen data to the listen website" but yea, that'd have to be an ctive not passive thing
      • nobiz has quit
      • nobiz joined the channel
      • nobiz has quit
      • nobiz joined the channel
      • rbatty has quit
      • carbolymer has quit
      • minimal joined the channel
      • Sciencentistguy
        do we have a way to encode "<artist> did mastering on track X of <release>" with the currently relationship system?
      • ArtGravity has quit
      • yvanzo
        Hi Sciencentistguy, mastering should rather be added to the release, see the description of deprecated https://musicbrainz.org/relationship/30adb2d7-d...
      • phunyguy has quit
      • phunyguy joined the channel
      • Island_ joined the channel
      • Island_ has quit
      • Island_ joined the channel
      • atj
        Sciencentistguy: it sucks because sometimes there are legitimate reasons to add a mastering credit to a recording. however the underlying reason for the deprecation is that changes in mastering don't require separate recordings...so you can see the issue.
      • inverse joined the channel
      • fletchto99 has quit
      • fletchto99 joined the channel
      • Island_ has quit
      • Island__ joined the channel
      • dzhi has quit
      • dzhi joined the channel
      • AJ_Z0 has quit
      • AJ_Z0 joined the channel
      • fletchto99 has quit
      • G0d has quit
      • fletchto99 joined the channel