#metabrainz

/

      • Pratha-Fish
        Oh wait a sec
      • 2022-09-01 24421, 2022

      • lucifer
        or tell me otherwise how the canonical mbid is found.
      • 2022-09-01 24436, 2022

      • Pratha-Fish
        ```select recording_mbid as old,
      • 2022-09-01 24437, 2022

      • Pratha-Fish
        canonical_recording_mbid as new
      • 2022-09-01 24437, 2022

      • Pratha-Fish
        from mapping.canonical_recording_redirect```
      • 2022-09-01 24444, 2022

      • Pratha-Fish
        I use the above query for the same
      • 2022-09-01 24402, 2022

      • lucifer
        yeah just replace the table there with the i mentioned
      • 2022-09-01 24411, 2022

      • Pratha-Fish
        Looks like it still uses the same one after all 🤦‍♂️
      • 2022-09-01 24417, 2022

      • Pratha-Fish
        On it
      • 2022-09-01 24444, 2022

      • Pratha-Fish
      • 2022-09-01 24400, 2022

      • Pratha-Fish
        It has the same number of differences as before though (650)
      • 2022-09-01 24423, 2022

      • Pratha-Fish
        I'll retry once without caching
      • 2022-09-01 24410, 2022

      • lucifer
        Pratha-Fish: yes, please. almost sure something wrong in the script generating this report because some of the mbids in there are not at all present in the new table.
      • 2022-09-01 24441, 2022

      • lucifer
      • 2022-09-01 24453, 2022

      • Pratha-Fish
        lucifer: The results just got it. There's a difference!
      • 2022-09-01 24407, 2022

      • Pratha-Fish
      • 2022-09-01 24407, 2022

      • Pratha-Fish
        Please check it out once again
      • 2022-09-01 24408, 2022

      • lucifer
        last entry in that file. see the canonical mbid is not present as canonical mbid at all in the new table.
      • 2022-09-01 24422, 2022

      • Pratha-Fish
        This time there's 677 differences
      • 2022-09-01 24438, 2022

      • lucifer
        hmm so it increased.
      • 2022-09-01 24453, 2022

      • Pratha-Fish
        right
      • 2022-09-01 24422, 2022

      • BrainzGit
        [bookbrainz-site] 14MonkeyDo merged pull request #814 (03master…nameSectionImprovements): feat(BB-432): Show possible duplicates next to the name section https://github.com/metabrainz/bookbrainz-site/pul…
      • 2022-09-01 24445, 2022

      • lucifer
        thanks for the report, Pratha-Fish.
      • 2022-09-01 24459, 2022

      • lucifer
        i'll look into it later, cc: alastairp.
      • 2022-09-01 24408, 2022

      • Pratha-Fish
        lucifer: You're welcome, and super sorry for keeping you hanging
      • 2022-09-01 24427, 2022

      • Pratha-Fish
        alastairp: Hi, I wanted to discuss a few points about the MLHD cleanup process if you're free
      • 2022-09-01 24439, 2022

      • Pratha-Fish
      • 2022-09-01 24444, 2022

      • Pratha-Fish
        Basically all of the above ^
      • 2022-09-01 24441, 2022

      • v6lur joined the channel
      • 2022-09-01 24439, 2022

      • CatQuest
        aerozol: I've got feedback on play icon on the coverarty: I actually see why (and I don't dislike the concept of being able to play on the page (with userscripts or links or what))
      • 2022-09-01 24439, 2022

      • CatQuest
        but when I see coverart in mb, my first idea is that clicking it will direct me directly to coverart tab of release/chose coverart for RG
      • 2022-09-01 24452, 2022

      • CatQuest
        also, gimme some time and Iwill talk to you about expand/collapse releaee groups pls
      • 2022-09-01 24446, 2022

      • CatQuest
        i you can wait until after the 7th that would be best, right noe things are way to hectic and my brain isn't on the subject at all, after it i'll have LOTS of time having to take it easy and an sit and talj to you
      • 2022-09-01 24423, 2022

      • CatQuest
        same to Shubh, monkey, reosarevok, anyone else who wants feedback/help/instruments/work whatever
      • 2022-09-01 24420, 2022

      • monkey
        Sure thing, testing can wait :)
      • 2022-09-01 24400, 2022

      • v6lur has quit
      • 2022-09-01 24446, 2022

      • v6lur joined the channel
      • 2022-09-01 24431, 2022

      • BrainzGit
        [bookbrainz-site] 14MonkeyDo merged pull request #863 (03master…dependabot/npm_and_yarn/moment-2.29.4): chore(deps): bump moment from 2.29.2 to 2.29.4 https://github.com/metabrainz/bookbrainz-site/pul…
      • 2022-09-01 24444, 2022

      • BrainzGit
        [bookbrainz-site] 14MonkeyDo merged pull request #864 (03master…dependabot/npm_and_yarn/webpack-cli-4.10.0): chore(deps-dev): bump webpack-cli from 4.9.1 to 4.10.0 https://github.com/metabrainz/bookbrainz-site/pul…
      • 2022-09-01 24404, 2022

      • BrainzGit
        [bookbrainz-site] 14MonkeyDo merged pull request #866 (03master…dependabot/npm_and_yarn/terser-5.14.2): chore(deps): bump terser from 5.10.0 to 5.14.2 https://github.com/metabrainz/bookbrainz-site/pul…
      • 2022-09-01 24443, 2022

      • piwu has quit
      • 2022-09-01 24409, 2022

      • BrainzGit
        [bookbrainz-site] 14MonkeyDo merged pull request #869 (03master…refactor/call2action): BB-682: Use plain links in call-to-action buttons https://github.com/metabrainz/bookbrainz-site/pul…
      • 2022-09-01 24420, 2022

      • alastairp
        hi Pratha-Fish, lucifer. I was out doing errands. let me check history
      • 2022-09-01 24419, 2022

      • piwu joined the channel
      • 2022-09-01 24429, 2022

      • alastairp
        lucifer: interesting that the differences increased! definitely something to look in to. let me see if I can find some time tomorrow for that
      • 2022-09-01 24416, 2022

      • alastairp
        Pratha-Fish: regarding rows without recording mbids, what's the number 78.13% is that the number without mbids, or with?
      • 2022-09-01 24434, 2022

      • Pratha-Fish
        alastairp: Hi, that's the % of rows that have a recording MBID present in them
      • 2022-09-01 24409, 2022

      • Pratha-Fish
        i.e. ~21.87 % rows in MLHD don't have any recording MBID
      • 2022-09-01 24436, 2022

      • alastairp
        again - try to be consistent in how you report these things. you say in the text "rows that don't have an mbid" and then you give a figure for rows that _do_ have mbids
      • 2022-09-01 24458, 2022

      • Pratha-Fish
        Oops
      • 2022-09-01 24458, 2022

      • alastairp
        it makes it easier for us to understand when reading the items
      • 2022-09-01 24405, 2022

      • Pratha-Fish
        Didin't notice that one. I'll fix it rn
      • 2022-09-01 24403, 2022

      • alastairp
        I think we should do 2 things with those - we were thinking of distributing two versions of the dataset, one "as close to the original as possible" and one "as much data as possible". The first, will keep the blank rows. it might be useful for people who want to know exactly when a user listened to music, even though they might not know what
      • 2022-09-01 24415, 2022

      • alastairp
        the 2nd, will have complete metadata (recording, artist(s), release) so that people who want to do something with the data can do so
      • 2022-09-01 24445, 2022

      • alastairp
        so it would be great if your code could have a flag which says "skip missing mbids" or "include rows for missing mbids"
      • 2022-09-01 24413, 2022

      • Pratha-Fish
        Yes definitely
      • 2022-09-01 24431, 2022

      • Pratha-Fish
        We could just completely leave such rows without rec-mbid alone
      • 2022-09-01 24432, 2022

      • Pratha-Fish
        While we're at it, should we move the artist-mbid column to the end of the dataset or just keep it as it is
      • 2022-09-01 24414, 2022

      • alastairp
        do you mean related to our discussion the other day?
      • 2022-09-01 24419, 2022

      • alastairp
        I think it's fine where it is
      • 2022-09-01 24438, 2022

      • Pratha-Fish
        yep. It won't affect the data set much, it might just make the text representation a little prettier.. which really doesn't matter anyway
      • 2022-09-01 24426, 2022

      • Pratha-Fish
        As for the next question, what about completely unknown rec-MBIDs.. i.e. the ones that weren't found in redirects, canonical, or the MB recording table
      • 2022-09-01 24456, 2022

      • alastairp
        turn them into blank rows
      • 2022-09-01 24415, 2022

      • alastairp
        that is, if we want to keep the row, output only the timestamp, if we want the "full data", omit the row entirely
      • 2022-09-01 24441, 2022

      • Pratha-Fish
        sounds good
      • 2022-09-01 24427, 2022

      • Pratha-Fish
        Also, would this be the final version of MLHD, or are we gonna re-iterate through the data once again
      • 2022-09-01 24402, 2022

      • alastairp
        we should release it when we're happy with the mapping, and that will be our final version
      • 2022-09-01 24439, 2022

      • Pratha-Fish
        Great :)
      • 2022-09-01 24442, 2022

      • alastairp
        re: parquet - yes, this is a good idea. we should have an option to write to either tsv/zstd, or to parquet, maybe if pandas gives us easily other formats we could do that too
      • 2022-09-01 24401, 2022

      • Pratha-Fish
        Just asking, because this one might take a while to get processed
      • 2022-09-01 24420, 2022

      • alastairp
        absolutely. I hope it'll be a similar time to the last thing we ran
      • 2022-09-01 24429, 2022

      • alastairp
        even if it's 2x as slow, that's only 10 days or so
      • 2022-09-01 24443, 2022

      • Pratha-Fish
        alastairp: Pandas, as well as PyArrow (the high performance that I'm currently using for reading and writing files)
      • 2022-09-01 24452, 2022

      • alastairp
        remember last time that I suggested that we can do parallel computation too - if we run it on 8 threads, it could be finished within a few days
      • 2022-09-01 24426, 2022

      • Pratha-Fish
        alastairp: yes, I'm hoping for a similar computation time as well, but the checking process seems to be quite slow right now, so it could take a while longer
      • 2022-09-01 24436, 2022

      • alastairp
        re: release mbids... that's another big question, and one that we haven't thought through completely. As a result of computing the canonical recording mbid, we end up also with the "canonical release mbid", so that's a good option
      • 2022-09-01 24406, 2022

      • alastairp
        another option - consider someone is listening to a compilation album. lots of songs by different artists
      • 2022-09-01 24422, 2022

      • alastairp
        we could report each song bas being a part of the original album that it was released on
      • 2022-09-01 24404, 2022

      • alastairp
        but we could also see if it would be possible to identify that these songs were actually listened to in order, and as part of an album
      • 2022-09-01 24442, 2022

      • Pratha-Fish
        Sounds interesting
      • 2022-09-01 24401, 2022

      • alastairp
        re: the checking speed: remember that the mbc endpoint is a basic interface over a database table that you also have access to: https://github.com/metabrainz/bono-data-sets/blob…
      • 2022-09-01 24410, 2022

      • alastairp
        do you remember when we discussed this?
      • 2022-09-01 24417, 2022

      • Pratha-Fish
        yes I remember
      • 2022-09-01 24429, 2022

      • Pratha-Fish
        I've implemented the sql query part as well
      • 2022-09-01 24435, 2022

      • alastairp
        we should bring this code inline, in fact we should load it into memory as well
      • 2022-09-01 24441, 2022

      • Pratha-Fish
        So that has been sped up exponentially
      • 2022-09-01 24458, 2022

      • alastairp
        this data can't be more than a few 10s of gb, if that. no problem to store in memory and just do a lookup into a dict
      • 2022-09-01 24403, 2022

      • alastairp
        what else is slow?
      • 2022-09-01 24418, 2022

      • Pratha-Fish
        IK what exactly is bottle necking the process at this point
      • 2022-09-01 24422, 2022

      • alastairp
        so the mbc lookup in the current code already goes directly to the db?
      • 2022-09-01 24434, 2022

      • alastairp
        ok cool, let's benchmark it then.
      • 2022-09-01 24446, 2022

      • alastairp
        when will you be working next? we could sit down tomorrow afternoon and look at it?
      • 2022-09-01 24449, 2022

      • Pratha-Fish
        I'm using pandas.map() functions to apply functions to a whole series / array
      • 2022-09-01 24412, 2022

      • Pratha-Fish
        But the forementioned function is just a fancy implementation of a non vectorized for loop, which makes it painfully slow
      • 2022-09-01 24421, 2022

      • Pratha-Fish
        The solution would be to somehow vectorize it
      • 2022-09-01 24446, 2022

      • Pratha-Fish
        re: let's benchmark it, when would you be free
      • 2022-09-01 24411, 2022

      • Pratha-Fish
        I'd be free by 5pm IST tomorrow
      • 2022-09-01 24457, 2022

      • Pratha-Fish
        ^1:30 PM Madrid time
      • 2022-09-01 24443, 2022

      • alastairp
        I won't be free for another ~2 hours after that
      • 2022-09-01 24418, 2022

      • alastairp
        but after that would be fine
      • 2022-09-01 24431, 2022

      • Pratha-Fish
        Sure, works for me too
      • 2022-09-01 24433, 2022

      • Pratha-Fish
        I'll should be free from the said time to 12:00AM IST (except for some time in b/w for dinner)
      • 2022-09-01 24419, 2022

      • lucifer
        Pratha-Fish: hi! i took a quick look again at the list. i think the report still has some issues. search for `Whirlwind in D Minor` in it and see that "canonical mbid lookup" links to a totally different recording. but if you open the lookup link, the lookup found by it is actually correct and matches the canonical mbid.
      • 2022-09-01 24420, 2022

      • lucifer
        oh yeah, Pratha-Fish looking at the report a lot of the canonical mbid same as the canonical mbid lookup of the previous row.
      • 2022-09-01 24454, 2022

      • alastairp
        🔍
      • 2022-09-01 24431, 2022

      • style- joined the channel
      • 2022-09-01 24420, 2022

      • tandy1000
        can you use listen imports to submit now playing listens too?
      • 2022-09-01 24443, 2022

      • v6lur has quit
      • 2022-09-01 24406, 2022

      • genpaku has quit
      • 2022-09-01 24436, 2022

      • genpaku joined the channel
      • 2022-09-01 24421, 2022

      • Sophist_UK joined the channel
      • 2022-09-01 24400, 2022

      • Pratha-Fish___ joined the channel
      • 2022-09-01 24402, 2022

      • mayhem_ joined the channel
      • 2022-09-01 24425, 2022

      • aerozol_ joined the channel
      • 2022-09-01 24407, 2022

      • yvanzo_ joined the channel
      • 2022-09-01 24413, 2022

      • Freso__ joined the channel
      • 2022-09-01 24424, 2022

      • [Pokey] joined the channel
      • 2022-09-01 24451, 2022

      • llrcombs joined the channel
      • 2022-09-01 24407, 2022

      • Mineo_ joined the channel
      • 2022-09-01 24408, 2022

      • atj1 joined the channel
      • 2022-09-01 24413, 2022

      • Hg_ joined the channel
      • 2022-09-01 24416, 2022

      • rektide_ joined the channel
      • 2022-09-01 24456, 2022

      • Pratha-Fish has quit
      • 2022-09-01 24456, 2022

      • Pratha-Fish___ is now known as Pratha-Fish
      • 2022-09-01 24458, 2022

      • yvanzo has quit
      • 2022-09-01 24458, 2022

      • style- has quit
      • 2022-09-01 24458, 2022

      • aerozol has quit
      • 2022-09-01 24458, 2022

      • Freso has quit
      • 2022-09-01 24458, 2022

      • Mineo has quit
      • 2022-09-01 24458, 2022

      • navap1 has quit
      • 2022-09-01 24459, 2022

      • DjSlash has quit
      • 2022-09-01 24459, 2022

      • rektide has quit
      • 2022-09-01 24459, 2022

      • atj has quit
      • 2022-09-01 24459, 2022

      • saumon has quit
      • 2022-09-01 24459, 2022

      • Sophist-UK has quit
      • 2022-09-01 24459, 2022

      • HenryG has quit
      • 2022-09-01 24459, 2022

      • rcombs has quit
      • 2022-09-01 24459, 2022

      • Pokey has quit
      • 2022-09-01 24459, 2022

      • mayhem has quit
      • 2022-09-01 24400, 2022

      • style- joined the channel
      • 2022-09-01 24400, 2022

      • aerozol_ is now known as aerozol
      • 2022-09-01 24400, 2022

      • mayhem_ is now known as mayhem
      • 2022-09-01 24400, 2022

      • Freso__ is now known as Freso
      • 2022-09-01 24437, 2022

      • yvanzo_ is now known as yvanzo
      • 2022-09-01 24458, 2022

      • navap1 joined the channel
      • 2022-09-01 24407, 2022

      • saumon joined the channel
      • 2022-09-01 24441, 2022

      • elgranRoble joined the channel