lucifer: can you help me fetch listens of some previous weeks since the db seems to be a little small to generate some of the stats
pite has quit
nbin has quit
nbin joined the channel
lucifer[m]
holycow23: you can get the last 30 days' dumps from by running `request_import_incremental` on wolf.
since you have at least one existing dump, i think it should work and import all incremental dumps available and not present in your installation.
rayyan_seliya123: okay, i'll check that error.
suvid: just take a look at `requirements.txt` and add it there. you'll need to run `./develop.sh build` after that so that it is available in the container.
even if you are saving extracted files to disk, you should take the precautions to avoid running into an infinite loop.
rayyan_seliya123: you can start working on integrating Internet Archive into Brainzplayer, you'll need a search API to find tracks in IA but for now just use hardcoded data and assume the endpoint exists. you can work in a new PR/branch.
suvid[m]
<lucifer[m]> "suvid: just take a look at `..." <- But what about the version?
lucifer[m]
you can specify the version in requirements.txt, most of the dependencies specify it
petitminion joined the channel
reosarevok[m] joined the channel
reosarevok[m]
Jeez. Even eBird has had to set up anubis it seems
The huge value to AI scrapers of... bird data?
rayyan_seliya123
<lucifer[m]> "rayyan_seliya123: okay, i'll..." <- Sure! also can you check what exactly error I am getting in my last commit to that pr I was failing the Listenbrainzserver tests after pushing that commit !
<lucifer[m]> "rayyan_seliya123: you can..." <- Sure I will create a new pr and start working on it ! Will need help of yours if stuck !
lucifer: when I am reading the listens from the files i am using ijson to read each json one by one from the list of json files in spotify listening history
so now should i submit them one by one?
cuz if i club them together then it might become a big list and would consume a lot of memory
* lucifer: when I am reading the listens from the files i am using ijson to read each json one by one from the list of json files in spotify listening history
so now should i submit them one by one?
cuz if i club them together to submit in a batch, then it might become a big list and would consume a lot of memory
* lucifer: when I am reading the listens from the files i am using ijson to read each json one by one from the list of json files in spotify listening history
so now should i submit listens one by one?
cuz if i club them together to submit in a batch, then it might become a big list and would consume a lot of memory
lucifer[m]
[@suvid:matrix.org](https://matrix.to/#/@suvid:matrix.org) chunk them in list of 100.
what is the format of data returned from spotify_cache.track?
lucifer[m]
also, add a fallback for API use when data is not found here. in development at the moment, these tables are empty but i'll update sample dumps to create some sample data here.
suvid[m]
s/spotify_cache/`spotify\_cache/, s/?/`?/
* what is the format of data returned from spotify_cache.track?
lucifer[m]
you can take a look at the schema i shared above.
suvid[m]
i was talking about the data field in the table
lucifer[m]
you won't need that.
suvid[m]
lucifer[m]: fallback api will be the spotify api itself right?
lucifer[m]
query spotify_cache.track with the track identifier and join it to spotify_cache.album for album name, join to spotify_cache.rel_album_artist for album artist and join to spotify_cache.rel_track_artist for track artists.
you have the duration of song played and timestamp available in the jsonl itself so that should be all that's needed.
suvid[m]
spotify_cache.rel_track_artist so i basically need to work with this table only right?
will search by track id and get the artist id
then query artist table to get the name
is this correct approach?
lucifer[m]
do it in one query but yes.
suvid[m]
also, could you please tell about the fallback api as well?
lucifer[m]
i wonder if we should take the album data from the cache too but fine to use the dump data for now i guess.
suvid[m]
album artists also contain only 1 artist only in the dump data
you just need the client id and client secret for that.
suvid[m]
i see
thanks
kellnerd[m] joined the channel
kellnerd[m]
suvid: In my own Spotify history importer I'm just using master_metadata_album_artist_name as track artist, despite the name.
suvid[m]
so is it the artist name only?
can the mapper work behind the scenes to assign correct mbid if i use this only?
kellnerd[m]
Spotify track artists are usually the same as the album artist anyway and I'm having trouble to find an example where the master_metadata_album_artist_name is not the correct (primary) track artist.
lucifer[m]
i would suggest to use the correct track artist name as we have the data easily available in the spotify metadata cache.
suvid[m]
kellnerd[m]: yea i also didnt find any example till now tho
For such cases it is really beneficial to have a Spotify metadata cache, the History files reduce these tracks to master_metadata_album_artist_name = "MEDUZA" where track vs album artist makes no difference.
suvid[m]
lucky for us, we have musicbrainz spotify cache and the spotify web api itself as well :)
kellnerd[m]
I was just looking for "Various artists" entries in my Spotify history samples, thinking that this is probably the one case where the distinction between primary track and album artist exists on Spotify... So far I've only found a track which literally has "Various artists" as the track artist 😂
So yeah, you're lucky to have access to the cache, my standalone tool has to do this on a best effort basis using just the primary artist (whether it is the album or track artist).
Often enough this is still sufficient for LB to map this to the correct recording, I've yet to find a compilation example where this simple approach might fail.
lucifer[m]
i think you can query the spotify api optionally for the same data.
if the user can provide a client id/secret.
but yeah if LB mapping is working fine then might not be worth the effort.
petitminion joined the channel
suvid[m]
<lucifer[m]> "if the user can provide a client..." <- this will only be the case if the user has turned on spotify in services right?
<lucifer[m]> "but yeah if LB mapping is..." <- so should i drop the idea of spotify web api for now?
lucifer[m]
suvid: the LB server has its own spotify client id/secret. you can use that always.
suvid[m]
yes yes i was planning on doing that only
lucifer[m]
<lucifer[m]> "but yeah if LB mapping is..." <- those comments were for harmony, kellnerd's cli tool.
suvid[m]
ohh ok
it kinda confused me 😅
sorry
lucifer[m]
which exists outisde of LB and might not have access to the client id/secret always.
kellnerd[m]
My LB tool is elbisaur though, harmony is the MB importer 😁