#metabrainz

/

      • holycow23[m] joined the channel
      • holycow23[m]
        lucifer: can you help me fetch listens of some previous weeks since the db seems to be a little small to generate some of the stats
      • pite has quit
      • nbin has quit
      • nbin joined the channel
      • lucifer[m]
        holycow23: you can get the last 30 days' dumps from by running `request_import_incremental` on wolf.
      • since you have at least one existing dump, i think it should work and import all incremental dumps available and not present in your installation.
      • rayyan_seliya123: okay, i'll check that error.
      • suvid: just take a look at `requirements.txt` and add it there. you'll need to run `./develop.sh build` after that so that it is available in the container.
      • even if you are saving extracted files to disk, you should take the precautions to avoid running into an infinite loop.
      • rayyan_seliya123: you can start working on integrating Internet Archive into Brainzplayer, you'll need a search API to find tracks in IA but for now just use hardcoded data and assume the endpoint exists. you can work in a new PR/branch.
      • suvid[m]
        <lucifer[m]> "suvid: just take a look at `..." <- But what about the version?
      • lucifer[m]
        you can specify the version in requirements.txt, most of the dependencies specify it
      • petitminion joined the channel
      • reosarevok[m] joined the channel
      • reosarevok[m]
        Jeez. Even eBird has had to set up anubis it seems
      • The huge value to AI scrapers of... bird data?
      • rayyan_seliya123
        <lucifer[m]> "rayyan_seliya123: okay, i'll..." <- Sure! also can you check what exactly error I am getting in my last commit to that pr I was failing the Listenbrainzserver tests after pushing that commit !
      • <lucifer[m]> "rayyan_seliya123: you can..." <- Sure I will create a new pr and start working on it ! Will need help of yours if stuck !
      • petitminion has quit
      • mamanullah7[m] has quit
      • reosarevok[m]
      • Our editor numbers are doing a Tesla!
      • Guess the script worked well then
      • Kladky joined the channel
      • nobiz has quit
      • ansh[m] joined the channel
      • ansh[m]
        holycow23: For the PR LB#3308, can you share a sample API payload data, so that i can mock and test?
      • BrainzBot
      • nobiz joined the channel
      • __BrainzGit
        [musicbrainz-server] 14reosarevok opened pull request #3586 (03master…flow-274): [WIP] Update Flow to 0.274.1 https://github.com/metabrainz/musicbrainz-serve...
      • petitminion joined the channel
      • [listenbrainz-server] 14anshg1214 merged pull request #3304 (03master…personal-rec-modal-tests): Rewrite tests for PersonalRecommendationsModal https://github.com/metabrainz/listenbrainz-serv...
      • [metabrainz.org] 14mayhem merged pull request #508 (03metabrainz-notifications…metabrainz-notifications): Add notification/send endpoint. https://github.com/metabrainz/metabrainz.org/pu...
      • suvid[m]
        lucifer: when I am reading the listens from the files i am using ijson to read each json one by one from the list of json files in spotify listening history
      • so now should i submit them one by one?
      • cuz if i club them together then it might become a big list and would consume a lot of memory
      • * lucifer: when I am reading the listens from the files i am using ijson to read each json one by one from the list of json files in spotify listening history
      • so now should i submit them one by one?
      • cuz if i club them together to submit in a batch, then it might become a big list and would consume a lot of memory
      • * lucifer: when I am reading the listens from the files i am using ijson to read each json one by one from the list of json files in spotify listening history
      • so now should i submit listens one by one?
      • cuz if i club them together to submit in a batch, then it might become a big list and would consume a lot of memory
      • lucifer[m]
        [@suvid:matrix.org](https://matrix.to/#/@suvid:matrix.org) chunk them in list of 100.
      • suvid[m]
        and i need to parse listens according to https://listenbrainz.readthedocs.io/en/latest/u... this right?
      • basically i need to extract relevant info for listens from the spotify listen and craft a listen according to the listenbrainz specification?
      • and then submit it right?
      • lucifer[m]
        Yes.
      • You can take a look at the existing spotify listens importer and reuse code if the format is same between api and the downloaded CSV files
      • s/CSV/json/
      • suvid[m]
        the format seems to be a bit different, i just checked now
      • suvid[m]: the one from the api and one from extended listening history
      • extended history gives only 1 artist name
      • * artist name and not all 👀
      • lucifer[m]
        sure, you can use the code and docs just as a reference then.
      • suvid[m]
        Also, after completing the spotify importer, should i create the UI for easier testing or create importers for other services first?
      • lucifer[m]
        UI and testing.
      • aim is
      • * aim is to complete and integrate one importer first.
      • suvid[m]
        okay I just realized lucifer
      • spotify extended streaming history does not give the track artist
      • it just gives the album artist
      • but track name and track artist are the 2 minimum required things to submit a listen right?
      • lucifer[m]
        suvid: you can query our spotify metadata cache for the artist details with the track id. if not present there query the spotify api.
      • suvid[m] uploaded an image: (42KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/YhUxlfcErxGCUKWBEMpztXKg/image.png >
      • suvid[m]
        like this is a listen
      • lucifer[m]: spotify metadata cache?
      • where can i find it?
      • petitminion has quit
      • lucifer[m]
      • these tables are defined in timescale db.
      • suvid[m]
        what is the format of data returned from spotify_cache.track?
      • lucifer[m]
        also, add a fallback for API use when data is not found here. in development at the moment, these tables are empty but i'll update sample dumps to create some sample data here.
      • suvid[m]
        s/spotify_cache/`spotify\_cache/, s/?/`?/
      • * what is the format of data returned from spotify_cache.track?
      • lucifer[m]
        you can take a look at the schema i shared above.
      • suvid[m]
        i was talking about the data field in the table
      • lucifer[m]
        you won't need that.
      • suvid[m]
        lucifer[m]: fallback api will be the spotify api itself right?
      • lucifer[m]
        query spotify_cache.track with the track identifier and join it to spotify_cache.album for album name, join to spotify_cache.rel_album_artist for album artist and join to spotify_cache.rel_track_artist for track artists.
      • you have the duration of song played and timestamp available in the jsonl itself so that should be all that's needed.
      • suvid[m]
        spotify_cache.rel_track_artist so i basically need to work with this table only right?
      • will search by track id and get the artist id
      • then query artist table to get the name
      • is this correct approach?
      • lucifer[m]
        do it in one query but yes.
      • suvid[m]
        also, could you please tell about the fallback api as well?
      • lucifer[m]
        i wonder if we should take the album data from the cache too but fine to use the dump data for now i guess.
      • suvid[m]
        album artists also contain only 1 artist only in the dump data
      • lucifer[m]
      • suvid[m]
        i think we should just take spotify track id from the dump and use the data we have
      • and for fallback, use the data provided in dump 🤣
      • how does that sound lucifer ?
      • lucifer[m]
        lucifer[m]: yup that's what i proposed here
      • suvid[m]
        * i think we should just take spotify track id from the dump and use the data we have in musicbrainz
      • and for fallback, use the data provided in dump 🤣
      • lucifer[m]
        suvid[m]: musicbrainz data is not linked to spotify in all cases so you can't really do that
      • the spotify metadata cache on the other hand should have all the data.
      • suvid[m]
        i see
      • <lucifer[m]> "https://developer.spotify.com/..." <- this API isnt implemented in the code right?... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
      • lucifer[m]
      • you just need the client id and client secret for that.
      • suvid[m]
        i see
      • thanks
      • kellnerd[m] joined the channel
      • kellnerd[m]
        suvid: In my own Spotify history importer I'm just using master_metadata_album_artist_name as track artist, despite the name.
      • suvid[m]
        so is it the artist name only?
      • can the mapper work behind the scenes to assign correct mbid if i use this only?
      • kellnerd[m]
        Spotify track artists are usually the same as the album artist anyway and I'm having trouble to find an example where the master_metadata_album_artist_name is not the correct (primary) track artist.
      • lucifer[m]
        i would suggest to use the correct track artist name as we have the data easily available in the spotify metadata cache.
      • suvid[m]
        kellnerd[m]: yea i also didnt find any example till now tho
      • kellnerd[m]
        Yeah, use the cached data when it is available.
      • lucifer[m]
        for one example: album artist has two artists and the first track has four artists: https://open.spotify.com/album/0GjnbPeC1Q1rtkjY...
      • kellnerd[m]
        For such cases it is really beneficial to have a Spotify metadata cache, the History files reduce these tracks to master_metadata_album_artist_name = "MEDUZA" where track vs album artist makes no difference.
      • suvid[m]
        lucky for us, we have musicbrainz spotify cache and the spotify web api itself as well :)
      • kellnerd[m]
        I was just looking for "Various artists" entries in my Spotify history samples, thinking that this is probably the one case where the distinction between primary track and album artist exists on Spotify... So far I've only found a track which literally has "Various artists" as the track artist 😂
      • So yeah, you're lucky to have access to the cache, my standalone tool has to do this on a best effort basis using just the primary artist (whether it is the album or track artist).
      • Often enough this is still sufficient for LB to map this to the correct recording, I've yet to find a compilation example where this simple approach might fail.
      • lucifer[m]
        i think you can query the spotify api optionally for the same data.
      • if the user can provide a client id/secret.
      • but yeah if LB mapping is working fine then might not be worth the effort.
      • petitminion joined the channel
      • suvid[m]
        <lucifer[m]> "if the user can provide a client..." <- this will only be the case if the user has turned on spotify in services right?
      • <lucifer[m]> "but yeah if LB mapping is..." <- so should i drop the idea of spotify web api for now?
      • lucifer[m]
        suvid: the LB server has its own spotify client id/secret. you can use that always.
      • suvid[m]
        yes yes i was planning on doing that only
      • lucifer[m]
        <lucifer[m]> "but yeah if LB mapping is..." <- those comments were for harmony, kellnerd's cli tool.
      • suvid[m]
        ohh ok
      • it kinda confused me 😅
      • sorry
      • lucifer[m]
        which exists outisde of LB and might not have access to the client id/secret always.
      • kellnerd[m]
        My LB tool is elbisaur though, harmony is the MB importer 😁
      • lucifer[m]
        ah sorry, yes.