I was looking into ListenBrainz as I was considering switching from Last.fm. Then I took a look at what a database dump looked like out of curiosity...I am very surprised that the listen data is not anonymized. Do not feel great about that at all, really.
2024-02-28 05916, 2024
Techman
Perhaps I had a false impression, but I had a belief that the actual tracks listened to would not be associated with individual users, at least in a public dump. I do think that releasing a listens database could be useful for statistical analysis on recordings but having users attached to that is not really necessary.
2024-02-28 05936, 2024
Techman
>By signing into ListenBrainz, you grant the MetaBrainz Foundation permission to include your listening history in data dumps we make publicly available under the CC0 license. None of your private information from your user profile will be included in these data dumps.
2024-02-28 05902, 2024
Techman
At least for me, I did not get the impression that this public dump of listening data would have my profile attached to it. This is because, you know, often times data that is made available for public research is anonymized so that said data cannot be traced back to users. At least easily.
2024-02-28 05937, 2024
Techman
In this case, I am not sure what the value is for including user information in the dump aside from spooking users and onlookers.
2024-02-28 05906, 2024
crism joined the channel
2024-02-28 05910, 2024
binzy joined the channel
2024-02-28 05944, 2024
aerozol[m] joined the channel
2024-02-28 05945, 2024
aerozol[m]
Techman: What do you mean by anonymized? ListenBrainz is an open database. And without attaching/grouping the listens to some sort of user entity the data is useless.
2024-02-28 05926, 2024
aerozol[m]
Basically, if you don’t want your listens linked together, I would not recommend using ListenBrainz or last.fm. That’s a decision a lot of MetaBrainz contributors make - we are generally pretyt privacy conscious
2024-02-28 05908, 2024
aerozol[m]
If you have ideas for something along the lines of replacing explicit user names (or something else?) with random strings, you could open a ticket for that. But AFAIK it would be easy to resolve it to a user on the site, with the same stats. But take my input with a grain of salt, I am a layperson when it comes to the data dumps :)
2024-02-28 05953, 2024
Techman
aerozol[m]: anonymized as in the data was generated by users but the link between the user and the data is not in the output.
2024-02-28 05903, 2024
Techman
It is not fool proof as this is a public site but do you kinda get what I mean?
2024-02-28 05938, 2024
Techman
Why would the data be useless if it could not be grouped to a user? I feel like you could still gain useful insights without a link. Clients used, popularity of songs, etc.
2024-02-28 05904, 2024
Techman
I do not mind my listening data being available for research but I would not want my username and ID to be linked to it in a public dump of all data. I feel like people should be using a profile page if the care about listens from a specific person. The way it is now, it feels creepy.
2024-02-28 05959, 2024
aerozol[m]
How would you make the data meaningful, for instance be able to say that x individual users have listened to an artist, without linking all of a users listens together?
2024-02-28 05904, 2024
aerozol[m]
Or generate recommendations?
2024-02-28 05922, 2024
aerozol[m]
You could calculate total listens of a song or artist, sure. But we'd have to remove user profile pages
2024-02-28 05934, 2024
rbatty joined the channel
2024-02-28 05900, 2024
Techman
The way the data is now, I can build a profile for every person on the site whether they know it or not, without requiring me to do anything. I am sure that I am not the only one who may not have a firm grasp on what is going on here.
2024-02-28 05959, 2024
aerozol[m]
Building a profile for every person on the site is the point of ListenBrainz. You are making a profile of all of your listens on an open source site. It's what last.fm does as well
2024-02-28 05928, 2024
aerozol[m]
(they're not open source, but they have an API and anyone can grab your data)
2024-02-28 05950, 2024
Techman
I am not a data expert when it comes to anonymization so I will defer for a real solution but I am sure there is a way to have some uniqueness in the data without easily tracing it back to a particular person
2024-02-28 05904, 2024
aerozol[m]
I guess the general idea is that if you don't attach identifiable information to your username you are 'anonymous' in terms of linking your account to your team life person
2024-02-28 05916, 2024
aerozol[m]
*real life person
2024-02-28 05900, 2024
Techman
I am treating having someone's username (e.g. looking at someone's profile page) different from public data dumps. If someone looks at my Last.fm page, then I would expect them to know my specific history. However I would not expect it to be in a public dump for research as there should be no need for my account to be identifiable in that.
2024-02-28 05932, 2024
aerozol[m]
I don't really understand what you mean - public dump, public page, what's the difference? FYI I can plug your last.fm username into lots of places to scrape out interesting data
2024-02-28 05934, 2024
aerozol[m]
I guess I'm not really disagreeing with your sentiment... Just that I don't see how ListenBrainz (even more so than other sites, given we are open source) can work around it
2024-02-28 05955, 2024
aerozol[m]
Maybe someone else can think of a middle ground, but I can't 😔
2024-02-28 05932, 2024
rbatty has quit
2024-02-28 05911, 2024
aerozol[m]
If there's anywhere you think we can clarify the language re. What will be public, that's something I can make a ticket for btw
2024-02-28 05948, 2024
Techman
IRC is perhaps not the best place to articulate thoughts but maybe I can try to make what I am thinking as clear as possible. Or make a forum post.
2024-02-28 05912, 2024
aerozol[m]
Forum is always good for more discussion and input 👍
2024-02-28 05902, 2024
aerozol[m]
Then tickets if something actionable comes out of it!
2024-02-28 05932, 2024
Techman
I guess I will take it from the top. I was originally going to migrate to ListenBrainz as I generally like open source stuff and I think Last.fm charging for reports is kind of bogus, but then I stopped because I checked out the database dumps and realized that identifiable info for a user is in the dump. I do not think that public data dumps should be traceable to users, at least directly.
2024-02-28 05934, 2024
Techman
For research purposes (often what this kind of data is made available for), there is no need to really link data to user accounts. There should be a way to anonymize the users in the output. I consider this different from visiting someone's profile page because the intent is different. Listening data for one particular user is very specific vs the public data dump which currently includes
2024-02-28 05934, 2024
Techman
everyone, fully traceable.
2024-02-28 05929, 2024
Techman
If the data were to be anonymized, then the people who grab the data dump can do research while not being able to trace it back to individuals unless they then looked up a person to connect it to that output.
2024-02-28 05927, 2024
Techman
The way the dump is now, everyone who has ever contributed a listen is exposed...even if they would rather only be found organically through other users. As someone looking at the data, I do not think I should have everyone's identifiable listening history. It feels creepy and unnecessary. Anonymous user IDs or some generated substitute could take the place of the usernames and user IDs and
2024-02-28 05927, 2024
Techman
the data could be pretty much as useful. If I wanted to know a specific person's history, I can always look them up.
2024-02-28 05959, 2024
Island_ has quit
2024-02-28 05958, 2024
binzy has quit
2024-02-28 05916, 2024
binzy joined the channel
2024-02-28 05920, 2024
binzy has quit
2024-02-28 05952, 2024
binzy joined the channel
2024-02-28 05924, 2024
ApeKattQuest
oh! I think I get where aerozol's and Techman's misscommunication is
2024-02-28 05924, 2024
ApeKattQuest
basically aerozol thought that users data will be removed completely, but the idea was to just put in something that's not an user name
2024-02-28 05930, 2024
ApeKattQuest
(a sentimenti kinda get tby)
2024-02-28 05932, 2024
ApeKattQuest
tbh*
2024-02-28 05924, 2024
outsidecontext has quit
2024-02-28 05935, 2024
outsidecontext joined the channel
2024-02-28 05933, 2024
G0d joined the channel
2024-02-28 05909, 2024
aerozol[m]
Not quite, I just think that replacing a username with a random number or something is an option, but doesn’t anonymise the data at all. It might just give users a false impression and be even worse, tbh. But we could, I guess. Could just remove usernames and replace them with random strings, like MBID’s (but I don’t see how this is any more anonymous, since listens are still ‘grouped’(
2024-02-28 05900, 2024
theracermaster has quit
2024-02-28 05900, 2024
SigHunter has quit
2024-02-28 05909, 2024
SigHunter joined the channel
2024-02-28 05922, 2024
rbatty joined the channel
2024-02-28 05937, 2024
ApeKattQuest
aerozol[m]: I think it means that someone wants to find an user's suername they'd have to mak like an effort for it, rather thna jsut having it thre, maybe?
2024-02-28 05938, 2024
ApeKattQuest
idk
2024-02-28 05949, 2024
ApeKattQuest
i don't super care, but i cna also kidna see the point too
2024-02-28 05907, 2024
slydacyfa has quit
2024-02-28 05948, 2024
zer0bitz has quit
2024-02-28 05909, 2024
zer0bitz joined the channel
2024-02-28 05948, 2024
trolley has quit
2024-02-28 05913, 2024
trolley joined the channel
2024-02-28 05913, 2024
trolley has quit
2024-02-28 05940, 2024
binzy has quit
2024-02-28 05918, 2024
trolley joined the channel
2024-02-28 05909, 2024
aerozol[m] has quit
2024-02-28 05930, 2024
chris8 joined the channel
2024-02-28 05928, 2024
MeatPupp3t has quit
2024-02-28 05957, 2024
MeatPupp3t joined the channel
2024-02-28 05920, 2024
SigHunter has quit
2024-02-28 05945, 2024
atj
providing a false sense of anonymity is worse than not providing it at all
2024-02-28 05954, 2024
SigHunter joined the channel
2024-02-28 05951, 2024
Techman
If the listen data is disassociated from usernames in the public dump, the only way to trace it back to someone would be to...look up that specific user's history.
2024-02-28 05919, 2024
nobiz joined the channel
2024-02-28 05931, 2024
Techman
Perhaps a better way to describe what I am saying is pseudoanonymization compared to strict anonymization.
2024-02-28 05904, 2024
nobiz has quit
2024-02-28 05904, 2024
nobiz joined the channel
2024-02-28 05902, 2024
nobiz has quit
2024-02-28 05918, 2024
nobiz joined the channel
2024-02-28 05905, 2024
nobiz has quit
2024-02-28 05905, 2024
nobiz joined the channel
2024-02-28 05925, 2024
nobiz has quit
2024-02-28 05942, 2024
nobiz joined the channel
2024-02-28 05945, 2024
nobiz has quit
2024-02-28 05901, 2024
nobiz joined the channel
2024-02-28 05936, 2024
ApeKattQuest
I mean I think that's fine, as long as it's spesified that you are not atually *anonymised* in as "if someone tries they can lookup just about any user by comparing listen data to the listen website" but yea, that'd have to be an ctive not passive thing
2024-02-28 05933, 2024
nobiz has quit
2024-02-28 05933, 2024
nobiz joined the channel
2024-02-28 05950, 2024
nobiz has quit
2024-02-28 05907, 2024
nobiz joined the channel
2024-02-28 05928, 2024
rbatty has quit
2024-02-28 05958, 2024
carbolymer has quit
2024-02-28 05907, 2024
minimal joined the channel
2024-02-28 05914, 2024
Sciencentistguy
do we have a way to encode "<artist> did mastering on track X of <release>" with the currently relationship system?
Sciencentistguy: it sucks because sometimes there are legitimate reasons to add a mastering credit to a recording. however the underlying reason for the deprecation is that changes in mastering don't require separate recordings...so you can see the issue.