lucifer: can you help me fetch listens of some previous weeks since the db seems to be a little small to generate some of the stats
2025-06-30 18157, 2025
pite has quit
2025-06-30 18151, 2025
nbin has quit
2025-06-30 18101, 2025
nbin joined the channel
2025-06-30 18133, 2025
lucifer[m]
holycow23: you can get the last 30 days' dumps from by running `request_import_incremental` on wolf.
2025-06-30 18128, 2025
lucifer[m]
since you have at least one existing dump, i think it should work and import all incremental dumps available and not present in your installation.
2025-06-30 18101, 2025
lucifer[m]
rayyan_seliya123: okay, i'll check that error.
2025-06-30 18112, 2025
lucifer[m]
suvid: just take a look at `requirements.txt` and add it there. you'll need to run `./develop.sh build` after that so that it is available in the container.
2025-06-30 18152, 2025
lucifer[m]
even if you are saving extracted files to disk, you should take the precautions to avoid running into an infinite loop.
2025-06-30 18155, 2025
lucifer[m]
rayyan_seliya123: you can start working on integrating Internet Archive into Brainzplayer, you'll need a search API to find tracks in IA but for now just use hardcoded data and assume the endpoint exists. you can work in a new PR/branch.
2025-06-30 18105, 2025
suvid[m]
<lucifer[m]> "suvid: just take a look at `..." <- But what about the version?
2025-06-30 18127, 2025
lucifer[m]
you can specify the version in requirements.txt, most of the dependencies specify it
2025-06-30 18125, 2025
petitminion joined the channel
2025-06-30 18116, 2025
reosarevok[m] joined the channel
2025-06-30 18116, 2025
reosarevok[m]
Jeez. Even eBird has had to set up anubis it seems
2025-06-30 18130, 2025
reosarevok[m]
The huge value to AI scrapers of... bird data?
2025-06-30 18108, 2025
rayyan_seliya123
<lucifer[m]> "rayyan_seliya123: okay, i'll..." <- Sure! also can you check what exactly error I am getting in my last commit to that pr I was failing the Listenbrainzserver tests after pushing that commit !
2025-06-30 18108, 2025
rayyan_seliya123
<lucifer[m]> "rayyan_seliya123: you can..." <- Sure I will create a new pr and start working on it ! Will need help of yours if stuck !
lucifer: when I am reading the listens from the files i am using ijson to read each json one by one from the list of json files in spotify listening history
2025-06-30 18105, 2025
suvid[m]
so now should i submit them one by one?
2025-06-30 18105, 2025
suvid[m]
cuz if i club them together then it might become a big list and would consume a lot of memory
2025-06-30 18115, 2025
suvid[m]
* lucifer: when I am reading the listens from the files i am using ijson to read each json one by one from the list of json files in spotify listening history
2025-06-30 18115, 2025
suvid[m]
so now should i submit them one by one?
2025-06-30 18115, 2025
suvid[m]
cuz if i club them together to submit in a batch, then it might become a big list and would consume a lot of memory
2025-06-30 18122, 2025
suvid[m]
* lucifer: when I am reading the listens from the files i am using ijson to read each json one by one from the list of json files in spotify listening history
2025-06-30 18122, 2025
suvid[m]
so now should i submit listens one by one?
2025-06-30 18122, 2025
suvid[m]
cuz if i club them together to submit in a batch, then it might become a big list and would consume a lot of memory
2025-06-30 18155, 2025
lucifer[m]
[@suvid:matrix.org](https://matrix.to/#/@suvid:matrix.org) chunk them in list of 100.
what is the format of data returned from spotify_cache.track?
2025-06-30 18138, 2025
lucifer[m]
also, add a fallback for API use when data is not found here. in development at the moment, these tables are empty but i'll update sample dumps to create some sample data here.
2025-06-30 18138, 2025
suvid[m]
s/spotify_cache/`spotify\_cache/, s/?/`?/
2025-06-30 18142, 2025
suvid[m]
* what is the format of data returned from spotify_cache.track?
2025-06-30 18148, 2025
lucifer[m]
you can take a look at the schema i shared above.
2025-06-30 18121, 2025
suvid[m]
i was talking about the data field in the table
2025-06-30 18131, 2025
lucifer[m]
you won't need that.
2025-06-30 18144, 2025
suvid[m]
lucifer[m]: fallback api will be the spotify api itself right?
2025-06-30 18129, 2025
lucifer[m]
query spotify_cache.track with the track identifier and join it to spotify_cache.album for album name, join to spotify_cache.rel_album_artist for album artist and join to spotify_cache.rel_track_artist for track artists.
2025-06-30 18155, 2025
lucifer[m]
you have the duration of song played and timestamp available in the jsonl itself so that should be all that's needed.
2025-06-30 18156, 2025
suvid[m]
spotify_cache.rel_track_artist so i basically need to work with this table only right?
2025-06-30 18113, 2025
suvid[m]
will search by track id and get the artist id
2025-06-30 18113, 2025
suvid[m]
then query artist table to get the name
2025-06-30 18126, 2025
suvid[m]
is this correct approach?
2025-06-30 18140, 2025
lucifer[m]
do it in one query but yes.
2025-06-30 18114, 2025
suvid[m]
also, could you please tell about the fallback api as well?
2025-06-30 18122, 2025
lucifer[m]
i wonder if we should take the album data from the cache too but fine to use the dump data for now i guess.
2025-06-30 18144, 2025
suvid[m]
album artists also contain only 1 artist only in the dump data
i think we should just take spotify track id from the dump and use the data we have
2025-06-30 18112, 2025
suvid[m]
and for fallback, use the data provided in dump 🤣
2025-06-30 18116, 2025
suvid[m]
how does that sound lucifer ?
2025-06-30 18129, 2025
lucifer[m]
lucifer[m]: yup that's what i proposed here
2025-06-30 18130, 2025
suvid[m]
* i think we should just take spotify track id from the dump and use the data we have in musicbrainz
2025-06-30 18130, 2025
suvid[m]
and for fallback, use the data provided in dump 🤣
2025-06-30 18112, 2025
lucifer[m]
suvid[m]: musicbrainz data is not linked to spotify in all cases so you can't really do that
2025-06-30 18134, 2025
lucifer[m]
the spotify metadata cache on the other hand should have all the data.
2025-06-30 18158, 2025
suvid[m]
i see
2025-06-30 18124, 2025
suvid[m]
<lucifer[m]> "https://developer.spotify.com/..." <- this API isnt implemented in the code right?... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/JiFdkDINHPpkOkWFXPCNpBul>)
you just need the client id and client secret for that.
2025-06-30 18106, 2025
suvid[m]
i see
2025-06-30 18106, 2025
suvid[m]
thanks
2025-06-30 18102, 2025
kellnerd[m] joined the channel
2025-06-30 18103, 2025
kellnerd[m]
suvid: In my own Spotify history importer I'm just using master_metadata_album_artist_name as track artist, despite the name.
2025-06-30 18104, 2025
suvid[m]
so is it the artist name only?
2025-06-30 18104, 2025
suvid[m]
can the mapper work behind the scenes to assign correct mbid if i use this only?
2025-06-30 18122, 2025
kellnerd[m]
Spotify track artists are usually the same as the album artist anyway and I'm having trouble to find an example where the master_metadata_album_artist_name is not the correct (primary) track artist.
2025-06-30 18125, 2025
lucifer[m]
i would suggest to use the correct track artist name as we have the data easily available in the spotify metadata cache.
2025-06-30 18153, 2025
suvid[m]
kellnerd[m]: yea i also didnt find any example till now tho
For such cases it is really beneficial to have a Spotify metadata cache, the History files reduce these tracks to master_metadata_album_artist_name = "MEDUZA" where track vs album artist makes no difference.
2025-06-30 18141, 2025
suvid[m]
lucky for us, we have musicbrainz spotify cache and the spotify web api itself as well :)
2025-06-30 18137, 2025
kellnerd[m]
I was just looking for "Various artists" entries in my Spotify history samples, thinking that this is probably the one case where the distinction between primary track and album artist exists on Spotify... So far I've only found a track which literally has "Various artists" as the track artist 😂
2025-06-30 18113, 2025
kellnerd[m]
So yeah, you're lucky to have access to the cache, my standalone tool has to do this on a best effort basis using just the primary artist (whether it is the album or track artist).
2025-06-30 18111, 2025
kellnerd[m]
Often enough this is still sufficient for LB to map this to the correct recording, I've yet to find a compilation example where this simple approach might fail.
2025-06-30 18122, 2025
lucifer[m]
i think you can query the spotify api optionally for the same data.
2025-06-30 18146, 2025
lucifer[m]
if the user can provide a client id/secret.
2025-06-30 18115, 2025
lucifer[m]
but yeah if LB mapping is working fine then might not be worth the effort.
2025-06-30 18109, 2025
petitminion joined the channel
2025-06-30 18138, 2025
suvid[m]
<lucifer[m]> "if the user can provide a client..." <- this will only be the case if the user has turned on spotify in services right?
2025-06-30 18107, 2025
suvid[m]
<lucifer[m]> "but yeah if LB mapping is..." <- so should i drop the idea of spotify web api for now?
2025-06-30 18108, 2025
lucifer[m]
suvid: the LB server has its own spotify client id/secret. you can use that always.
2025-06-30 18122, 2025
suvid[m]
yes yes i was planning on doing that only
2025-06-30 18131, 2025
lucifer[m]
<lucifer[m]> "but yeah if LB mapping is..." <- those comments were for harmony, kellnerd's cli tool.
2025-06-30 18139, 2025
suvid[m]
ohh ok
2025-06-30 18145, 2025
suvid[m]
it kinda confused me 😅
2025-06-30 18145, 2025
suvid[m]
sorry
2025-06-30 18154, 2025
lucifer[m]
which exists outisde of LB and might not have access to the client id/secret always.
2025-06-30 18158, 2025
kellnerd[m]
My LB tool is elbisaur though, harmony is the MB importer 😁
2025-06-30 18120, 2025
lucifer[m]
ah sorry, yes.
2025-06-30 18122, 2025
__BrainzGit
[musicbrainz-server] 14mwiencek opened pull request #3587 (03master…mbs-14081): MBS-14081: Log out accounts from Discourse after they've been marked as spam https://github.com/metabrainz/musicbrainz-server/…
<reosarevok[m]> "Guess the script worked well..." <- the script did remove about 800k accounts, but the other 400k was me flagging spam 😛
2025-06-30 18100, 2025
Sophist-UK joined the channel
2025-06-30 18153, 2025
mamanullah7[m] joined the channel
2025-06-30 18153, 2025
mamanullah7[m]
<mamanullah7[m]> "Hey lucifer: i fixed the..." <- > <@m.amanullah7:matrix.org> Hey lucifer: i fixed the frontend issue it was authentication error! Now i can play songs using funkwhale!!... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/DMoOBxKBcoXebVygbEguXLPH>)
2025-06-30 18159, 2025
lucifer[m]
m.amanullah7: i'll do it today.
2025-06-30 18156, 2025
mamanullah7[m]
lucifer: Thanks
2025-06-30 18134, 2025
reosarevok[m]
<bitmap[m]> "the script did remove about 800k..." <- You had a second one-off you wanted to run to remove old dodgy accounts too, right
2025-06-30 18155, 2025
pite joined the channel
2025-06-30 18100, 2025
bitmap[m]
I did already delete about 4K empty dodgy accounts connected with spammers that had confirmed email addresses (after checking they had no rows in the MeB/LB/CB user tables either)
2025-06-30 18131, 2025
bitmap[m]
but I think we should just modify the script you made to remove the email check
2025-06-30 18106, 2025
bitmap[m]
and have it check the other projects' DBs directly to be safe