#metabrainz

/

      • Toasty has quit
      • 2023-01-02 00220, 2023

      • serialata joined the channel
      • 2023-01-02 00220, 2023

      • serial-ata joined the channel
      • 2023-01-02 00201, 2023

      • Maxr1998 joined the channel
      • 2023-01-02 00235, 2023

      • Maxr1998_ has quit
      • 2023-01-02 00254, 2023

      • Divyansh joined the channel
      • 2023-01-02 00253, 2023

      • Divyansh has quit
      • 2023-01-02 00211, 2023

      • kaine2 has quit
      • 2023-01-02 00236, 2023

      • kaine2 joined the channel
      • 2023-01-02 00200, 2023

      • BrainzGit
        [listenbrainz-server] 14amCap1712 opened pull request #2316 (03master…missing-left-join): Change inner join to left join in spark listens dump https://github.com/metabrainz/listenbrainz-server…
      • 2023-01-02 00227, 2023

      • lucifer
        jasje: what url are you calling? https://test-api.listenbrainz.org/1/stats/user/ja… shows some data to me.
      • 2023-01-02 00234, 2023

      • lucifer
        vibhoo_24: will do
      • 2023-01-02 00205, 2023

      • serialata has quit
      • 2023-01-02 00205, 2023

      • serial-ata has quit
      • 2023-01-02 00244, 2023

      • akshaaatt
      • 2023-01-02 00207, 2023

      • akshaaatt
        akshaaatt: , ansh , riksucks , lucifer in the picture!
      • 2023-01-02 00230, 2023

      • akshaaatt
        New Year Bang, with the MetaBrainz Gang! 🔥⚡️
      • 2023-01-02 00205, 2023

      • vibhoo_24 joined the channel
      • 2023-01-02 00235, 2023

      • vibhoo_24
        lucifer: for syncing my forked repo should I use git commands for should I directly do it from the github?
      • 2023-01-02 00252, 2023

      • lucifer
        vibhoo_24: either is fine.
      • 2023-01-02 00206, 2023

      • vibhoo_24
        okay done
      • 2023-01-02 00252, 2023

      • vibhoo_24 has quit
      • 2023-01-02 00202, 2023

      • jasje joined the channel
      • 2023-01-02 00254, 2023

      • jasje
        lucifer: i was using the non test version all along :(
      • 2023-01-02 00255, 2023

      • jasje
      • 2023-01-02 00259, 2023

      • jasje
        this one
      • 2023-01-02 00233, 2023

      • jasje has quit
      • 2023-01-02 00258, 2023

      • lucifer
        jasje: that is fine. but you need to add the year 2022 at the end. at least until we make 2022 the default (probably on 4th Jan)
      • 2023-01-02 00215, 2023

      • jasje joined the channel
      • 2023-01-02 00247, 2023

      • alastairp
        hello!
      • 2023-01-02 00205, 2023

      • jasje
        hellu!
      • 2023-01-02 00225, 2023

      • jasje
        lucifer: mb thanks
      • 2023-01-02 00256, 2023

      • lucifer
        hi alastairp !
      • 2023-01-02 00241, 2023

      • jasje
        lucifer: also about the playlist posters or artCovers
      • 2023-01-02 00212, 2023

      • jasje
        what should i use?
      • 2023-01-02 00203, 2023

      • lucifer
      • 2023-01-02 00219, 2023

      • jasje
        and if i wanted to search based on mbids?
      • 2023-01-02 00224, 2023

      • jasje
        lucifer: is there any way??
      • 2023-01-02 00255, 2023

      • lucifer
        jasje: yes, there are a couple of ways.
      • 2023-01-02 00204, 2023

      • lucifer
        which part do you want this for?
      • 2023-01-02 00200, 2023

      • jasje
        In the figma file, second frame first page, top artists of 2022
      • 2023-01-02 00213, 2023

      • jasje
        top albums of 2022**
      • 2023-01-02 00245, 2023

      • lucifer
        jasje: you can construct the url for image this way: `https://archive.org/download/mbid-{caa_release_mbid}/mbid-{caa_release_mbid}-{caa_id}_thumb500.jpg`
      • 2023-01-02 00219, 2023

      • lucifer
        items in the top album list have caa_id and caa_release_mbid fields present and not null if there is a cover art available.
      • 2023-01-02 00230, 2023

      • lucifer
        if the field is missing then no cover art available.
      • 2023-01-02 00224, 2023

      • lucifer
      • 2023-01-02 00240, 2023

      • lucifer
        like in this case only the 3rd item has cover art.
      • 2023-01-02 00247, 2023

      • jasje
        oh so thats the reason
      • 2023-01-02 00256, 2023

      • jasje
        so caa is basically used for the poster?
      • 2023-01-02 00210, 2023

      • lucifer
        yes, caa is short for cover art archive
      • 2023-01-02 00221, 2023

      • jasje
        that explains alot
      • 2023-01-02 00225, 2023

      • jasje
        Thankyou!
      • 2023-01-02 00211, 2023

      • BrainzGit
        [listenbrainz-server] 14amCap1712 opened pull request #2317 (03master…youtube-error-fix): LB-1168: Improve Youtube rate limit notification https://github.com/metabrainz/listenbrainz-server…
      • 2023-01-02 00240, 2023

      • mayhem
        mooooin!
      • 2023-01-02 00215, 2023

      • lucifer
        happy new year! and some mapping issues to make your day better :) https://musicbrainz.org/recording/e2bb4a3e-9579-4… and https://musicbrainz.org/recording/a0c1dd06-cc24-4…
      • 2023-01-02 00249, 2023

      • mayhem
        joy.
      • 2023-01-02 00236, 2023

      • mayhem
        simple solution: For a team of assassins. Find artists. Make a public example of some. Hope for better.
      • 2023-01-02 00239, 2023

      • mayhem
        easy, no?
      • 2023-01-02 00245, 2023

      • mayhem
        *form a team
      • 2023-01-02 00256, 2023

      • lucifer
        😆
      • 2023-01-02 00225, 2023

      • rudraksh has quit
      • 2023-01-02 00227, 2023

      • Toasty joined the channel
      • 2023-01-02 00225, 2023

      • aerozol
        akshaaatt: wow mb gang!! 🔥
      • 2023-01-02 00203, 2023

      • aerozol
        No meeting tomorrow right?
      • 2023-01-02 00210, 2023

      • akshaaatt
        Yuss aerozol ! ⚡️
      • 2023-01-02 00237, 2023

      • akshaaatt
        Why? Are you planning to join us here , aerozol ?
      • 2023-01-02 00239, 2023

      • aerozol
        Wait, in India or on IRC > <
      • 2023-01-02 00243, 2023

      • aerozol
        One is more likely
      • 2023-01-02 00216, 2023

      • jasje has quit
      • 2023-01-02 00259, 2023

      • Toasty has quit
      • 2023-01-02 00228, 2023

      • lucifer
        aerozol: as the agenda says, next IRC meeting is on 9th
      • 2023-01-02 00255, 2023

      • mayhem
        lucifer: for the top discoveries data, I should run that on gaga, yes?
      • 2023-01-02 00235, 2023

      • lucifer
        mayhem: thats the one that needs track data?
      • 2023-01-02 00203, 2023

      • lucifer
        i mean all the tracks that a user listened to in the year data
      • 2023-01-02 00215, 2023

      • mayhem
        no, the other one.
      • 2023-01-02 00258, 2023

      • lucifer
        i see. currently those playlists are generated together when data comes in from spark.
      • 2023-01-02 00220, 2023

      • lucifer
        i can add another command to directly test for a subset of users
      • 2023-01-02 00238, 2023

      • mayhem
        ah, no. shouldn't be necessary. the data is fetched via the dataset hoster off wolf.
      • 2023-01-02 00253, 2023

      • mayhem
        I'll just run the script to generate the full data and we should be set.
      • 2023-01-02 00222, 2023

      • lucifer
        ah cool
      • 2023-01-02 00200, 2023

      • lucifer
        i think we should wait for 4th to generate the data
      • 2023-01-02 00223, 2023

      • mayhem
        in case more people do imports?
      • 2023-01-02 00234, 2023

      • mayhem
        ok, that's cool. allows me to do stupid finances and the like.
      • 2023-01-02 00237, 2023

      • lucifer
        at least for final playlists. because the latest full dump hasnt yet been imported.
      • 2023-01-02 00240, 2023

      • lucifer
        yes that too
      • 2023-01-02 00207, 2023

      • lucifer
        also, i found a bug in my latest dump changes so opened a PR to fix.
      • 2023-01-02 00241, 2023

      • mayhem
        2316?
      • 2023-01-02 00224, 2023

      • lucifer
        yes
      • 2023-01-02 00232, 2023

      • mayhem
        lgtm
      • 2023-01-02 00253, 2023

      • lucifer
        already generated a dump and currently being imported in spark. size of the dump looks correct.
      • 2023-01-02 00217, 2023

      • BrainzGit
        [listenbrainz-server] 14amCap1712 merged pull request #2316 (03master…missing-left-join): Change inner join to left join in spark listens dump https://github.com/metabrainz/listenbrainz-server…
      • 2023-01-02 00242, 2023

      • BrainzGit
        [musicbrainz-android] 14dependabot[bot] opened pull request #175 (03master…dependabot/gradle/org.jetbrains.kotlin-kotlin-gradle-plugin-1.8.0): Bump kotlin-gradle-plugin from 1.7.10 to 1.8.0 https://github.com/metabrainz/musicbrainz-android…
      • 2023-01-02 00245, 2023

      • BrainzGit
        [musicbrainz-android] 14dependabot[bot] closed pull request #163 (03master…dependabot/gradle/org.jetbrains.kotlin-kotlin-gradle-plugin-1.7.22): Bump kotlin-gradle-plugin from 1.7.10 to 1.7.22 https://github.com/metabrainz/musicbrainz-android…
      • 2023-01-02 00253, 2023

      • BrainzGit
        [musicbrainz-android] 14dependabot[bot] opened pull request #176 (03master…dependabot/gradle/com.squareup.okhttp3-mockwebserver-5.0.0-alpha.11): Bump mockwebserver from 5.0.0-alpha.7 to 5.0.0-alpha.11 https://github.com/metabrainz/musicbrainz-android…
      • 2023-01-02 00229, 2023

      • Toasty joined the channel
      • 2023-01-02 00221, 2023

      • Pratha-Fish
        Hi alastairp, Hope you had a great holiday :)
      • 2023-01-02 00244, 2023

      • alastairp
        hi Pratha-Fish, how are you?
      • 2023-01-02 00255, 2023

      • alastairp
        I had a busy break, but it was fulfilling
      • 2023-01-02 00228, 2023

      • Pratha-Fish
        These breaks never last long enough 🥲
      • 2023-01-02 00256, 2023

      • Pratha-Fish Even my college was gonna open back on ~20th Jan, but looks like they changed their mind and started it back again from today itself
      • 2023-01-02 00207, 2023

      • alastairp
        your college is so weird
      • 2023-01-02 00225, 2023

      • Pratha-Fish
        alastairp: You have seen nothing yet 💀
      • 2023-01-02 00234, 2023

      • alastairp
        if mine changed anything from what was agreed 2 years ago, the unions would shut everything down
      • 2023-01-02 00222, 2023

      • Pratha-Fish
        Well, you went to a pretty good college. My college campus itself looks like something out of Far Cry 3... With gangs and stuff
      • 2023-01-02 00245, 2023

      • alastairp
        😬
      • 2023-01-02 00200, 2023

      • Pratha-Fish
        Thankfully some good friends make college life manageable haha
      • 2023-01-02 00233, 2023

      • Pratha-Fish
        But anyway, hopefully I'll be taking more days off ahead, so work should'nt be much of a problem
      • 2023-01-02 00216, 2023

      • vibhoo_24 joined the channel
      • 2023-01-02 00247, 2023

      • Pratha-Fish
        I was checking this one out, and looks like you had to make a lot of edits in it alastairp https://github.com/Prathamesh-Ghatole/MLHD/commit…
      • 2023-01-02 00232, 2023

      • Pratha-Fish
        Sorry you had to go through all that effort even though I had 4 months to see the work through🥲
      • 2023-01-02 00244, 2023

      • Toasty has quit
      • 2023-01-02 00255, 2023

      • alastairp
        Pratha-Fish: yes, right. loading some more data from musicbrainz, and then being very detailed about how we treated each bit of data and what we do with it in each case
      • 2023-01-02 00224, 2023

      • Pratha-Fish
        Hmm
      • 2023-01-02 00229, 2023

      • alastairp
        Pratha-Fish: you would be surprised... remember that 90% of what I wrote in this change was easy for me because 1) I've done this kind of thing for 15 years, or 2) because you did all of the heavy lifting to answer all of our unknown questions about the data
      • 2023-01-02 00242, 2023

      • alastairp
        I'm still really happy with where we got to
      • 2023-01-02 00235, 2023

      • Pratha-Fish
        Well that was one hell of a learning curve haha. I still don't know how we got to the end in the first place lol
      • 2023-01-02 00245, 2023

      • alastairp
        the key was to work out which bits of data your conversion code had left behind, I tried to carefully discuss this in the comments in the process_df_new function
      • 2023-01-02 00225, 2023

      • Pratha-Fish
        Thanks for the comments, they have been pretty helpful
      • 2023-01-02 00239, 2023

      • Pratha-Fish
        I'll try to leave more along the way too
      • 2023-01-02 00255, 2023

      • Pratha-Fish
        alastairp: So can you give me a brief overview of the new changes? And what steps we need to take ahead
      • 2023-01-02 00239, 2023

      • Pratha-Fish
        ^ Whenever you're free that is
      • 2023-01-02 00210, 2023

      • alastairp
        Pratha-Fish: your code worked well for items which had a recording mbid, and for which the recording mbid was valid
      • 2023-01-02 00227, 2023

      • alastairp
        so for recordings we go through our steps: look up if there is a redirect and replace it if necessary; look up if there is a canonical id and replace it if necessary; and then look up artist and release information
      • 2023-01-02 00217, 2023

      • alastairp
        there's one pending item that came up in my testing here where we have what are called "non album tracks" in musicbrainz - that is, a recording with no related album. It turns out that there were quite a few of these, and I think that it's due to bad data in the mlhd, we need to come up with a better way of looking up this
      • 2023-01-02 00203, 2023

      • alastairp
        Then I was looking at the case of "what if there is an artist and release id, but no recording?" - you had already considered this a bit (your keep_missing, turn_blank parameters)
      • 2023-01-02 00246, 2023

      • alastairp
        when I started looking at the data in detail, a few things became clear to me. first, we were talking about having 2 datasets, one with only rows that have all columns (artist, release, recording), and one that has all rows from mlhd (without throwing away bad data)
      • 2023-01-02 00222, 2023

      • alastairp
        I realised that we could actually make a single dataset that contains both of this, by making one set of files with the "all column" data, and another set of files with the same filename in a separate directory containing _only_ the incomplete rows. This means if you need only all column data, you read just 1 file, and if you want all rows, you read 2 files and merge the rows together
      • 2023-01-02 00223, 2023

      • Pratha-Fish
        "non album tracks" that's a new one
      • 2023-01-02 00246, 2023

      • alastairp
        this is great because it means we don't need to make the dataset 2x as big to get all of the necessary data
      • 2023-01-02 00203, 2023

      • Pratha-Fish
        What do you mean by "column data" here?
      • 2023-01-02 00229, 2023

      • alastairp
        I mean the columns from the mlhd, timestamp, artist id, release id, recording id
      • 2023-01-02 00247, 2023

      • Pratha-Fish
        Ah I see
      • 2023-01-02 00258, 2023

      • alastairp
        so I mean "rows for which we have an artist id, a release id, and a recording id"
      • 2023-01-02 00207, 2023

      • Pratha-Fish
        And while reading this, something just popped up in my mind
      • 2023-01-02 00229, 2023

      • alastairp
        because there are a bunch of rows with only an artist and release id, sometimes there are rows with only a timestamp (we know someone listened to something at this time, but there's no record of what it was)
      • 2023-01-02 00224, 2023

      • Pratha-Fish
        Given that people tend to listen to the same songs again and again, we can just make a dataset of unique rows, and then couple it with another data set with just the row_ID and a list of timestamps at which the track was listened at
      • 2023-01-02 00208, 2023

      • alastairp
        yes, possibly. there are actually databases that are designed to consider/store data in this alternate format. I'm not sure how much space it would save, but we can definitely try it and find out
      • 2023-01-02 00255, 2023

      • Pratha-Fish
        Yes exactly
      • 2023-01-02 00222, 2023

      • Pratha-Fish
        I came across another last.fm dataset too a while ago, and they seemed to distribute the data this way
      • 2023-01-02 00211, 2023

      • alastairp
        the other thing that came up was some of our previous questions about recordings, but in this case applied to releases. So, I did the same process - 1) perform a redirect lookup, 2) find a canonical release id (this is a new dataset that I made only a month ago), find the artist of the release
      • 2023-01-02 00226, 2023

      • alastairp
        this bought up another question about if the release id is actually the correct field to use - we realised that in probably 99% of cases that people want to use this dataset, they're really just interested in knowing the general concept of "what album did someone listen to", not the specific version/format/year that it was released in
      • 2023-01-02 00245, 2023

      • alastairp
        in this case, the release group id is a better choice. so we need to work out how to add this to the dataset
      • 2023-01-02 00241, 2023

      • Pratha-Fish
        That definitely sounds a lot better
      • 2023-01-02 00242, 2023

      • Pratha-Fish
        Also, with our previous code, we were clearly missing out on some huge chunks of data where the recording MBID wasnt present, but the release MBID or artist MBID was
      • 2023-01-02 00205, 2023

      • Pratha-Fish
        And ig we can also derive artist MBIDs from release MBIDs too
      • 2023-01-02 00231, 2023

      • alastairp
        right, so I addressed that in the 2nd and 3rd third of the process_df_new function
      • 2023-01-02 00201, 2023

      • alastairp
        see that there is a 'if recording', and then if things are successful there is a 'continue', otherwise it falls down to the 'if release' and 'if artist' cases
      • 2023-01-02 00236, 2023

      • Pratha-Fish
        Great!
      • 2023-01-02 00254, 2023

      • Pratha-Fish
        It makes me wonder, how's the processing time looking as of now?
      • 2023-01-02 00214, 2023

      • alastairp
        I don't really know - I think slower than your original code, but not by much