#metabrainz

/

      • BrainzGit
        [critiquebrainz] 14alastair merged pull request #457 (03master…add_avg_rating_total_count): feat: Return average ratings count in API https://github.com/metabrainz/critiquebrainz/pull…
      • 2022-08-16 22835, 2022

      • ansh
        alastairp: the changes for series and author were already there in 452 right?
      • 2022-08-16 22856, 2022

      • alastairp
        yes, I had a look and I saw the commits there
      • 2022-08-16 22820, 2022

      • ansh
        So we did not have to merge 453 and 455
      • 2022-08-16 22833, 2022

      • ansh
        The commits would be repeated
      • 2022-08-16 22855, 2022

      • alastairp
        correct. in fact, I only merged 452, and github automatically closed 453 and 455
      • 2022-08-16 22809, 2022

      • alastairp
        because it saw that their commits also existed in `master`
      • 2022-08-16 22846, 2022

      • ansh
        Oh I thought you merged all three
      • 2022-08-16 22854, 2022

      • alastairp
        nope, magic
      • 2022-08-16 22824, 2022

      • yvanzo
        al-bondigas: It should not require any change to files provided by git. If you made any changes, revert these.
      • 2022-08-16 22838, 2022

      • alastairp
        ansh: did you move 456 back to draft? and 458 is pending further work?
      • 2022-08-16 22828, 2022

      • ansh
        For 456, I'll quickly rebase and make it for all the entity types
      • 2022-08-16 22824, 2022

      • ansh
        For 458, The error message is not getting translated
      • 2022-08-16 22806, 2022

      • alastairp
        that sounds like a bug
      • 2022-08-16 22810, 2022

      • ansh
        Since it is an API, we don't have an option to translate it
      • 2022-08-16 22818, 2022

      • ansh
        We are not providing any language to it
      • 2022-08-16 22820, 2022

      • alastairp
        ahhh I thought it was a user message
      • 2022-08-16 22833, 2022

      • alastairp
        my mistake. no problem about the translation, then
      • 2022-08-16 22801, 2022

      • alastairp
        how about the issue of encountering a server error if we forget to update the mapping?
      • 2022-08-16 22826, 2022

      • alastairp
        I think that just using a generic word in this error is OK. it's been like this since when we added other entities to CB
      • 2022-08-16 22839, 2022

      • ansh
        For adding a new entity, we already have 4-5 mappings iirc
      • 2022-08-16 22846, 2022

      • alastairp
        I wonder if we could move these all to one place
      • 2022-08-16 22834, 2022

      • ansh
        The generic word also sounds good, but in CB-444, monkey mentioned "Is there a way we can use the type of entity we are submitting a review for ?", so I did it in that way
      • 2022-08-16 22835, 2022

      • BrainzBot
        CB-444: API error messages are music-specific https://tickets.metabrainz.org/browse/CB-444
      • 2022-08-16 22844, 2022

      • alastairp
      • 2022-08-16 22840, 2022

      • alastairp
        ah, interesting. this error is passed through directly to the BB user from the CB API
      • 2022-08-16 22812, 2022

      • ansh
        I think it is also being done in LB
      • 2022-08-16 22834, 2022

      • alastairp
        even in this case, I don't see a problem with saying "item" instead of album
      • 2022-08-16 22843, 2022

      • alastairp
        or "entity"
      • 2022-08-16 22808, 2022

      • alastairp
        but in the case that we do want to show the actual entity type, we absolutely should have only one mapping of entity types to names, and ideally it should be in an easy to find place (the link that I just posted is a bad place)
      • 2022-08-16 22831, 2022

      • alastairp
        maybe we should put both the type and the mapping here: https://github.com/metabrainz/critiquebrainz/blob…
      • 2022-08-16 22812, 2022

      • ansh
        Agreed
      • 2022-08-16 22856, 2022

      • alastairp
        I don't know what the behaviour of gettext is if we try and use a translated value in the API without first setting it up. perhaps we can have a plain dict in review.py, and then wrap the values in gettext for the version in user.py
      • 2022-08-16 22814, 2022

      • alastairp
        though it's not clear if gettext will generate the .po files correctly if we call gettext in a loop, rather than passing it a constant
      • 2022-08-16 22858, 2022

      • alastairp
        let's put this one off for now too, then
      • 2022-08-16 22819, 2022

      • alastairp
        do you want to include CB#456 in today's release?
      • 2022-08-16 22820, 2022

      • BrainzBot
        Handle Deleted and Redirected Edition Groups: https://github.com/metabrainz/critiquebrainz/pull…
      • 2022-08-16 22838, 2022

      • ansh
        Yes
      • 2022-08-16 22858, 2022

      • ansh
        I am done with the changes, just testing it before pushing
      • 2022-08-16 22811, 2022

      • alastairp
        great
      • 2022-08-16 22805, 2022

      • Pratha-Fish
        alastairp: Hi, are you currently free for a discussion?
      • 2022-08-16 22859, 2022

      • ansh
        alastairp: Pushed the changes
      • 2022-08-16 22836, 2022

      • yvanzo
        al-bondigas: I just tested creating a database from the current master HEAD which is tagged v-2022-08-09 and it worked flawlessly, so It isn’t a regression.
      • 2022-08-16 22841, 2022

      • yvanzo
        al-bondigas: Since it is an issue with your setup, I recommend you to restart from scratch after having cleared everything as follows: sudo docker-compose down --remove-orphans --rmi all --volumes && cd .. && rm -fr musicbrainz-docker
      • 2022-08-16 22816, 2022

      • BrainzGit
        [critiquebrainz] 14alastair merged pull request #456 (03master…handle_deleted_and_redirected_entities): Handle Deleted and Redirected BB Entities https://github.com/metabrainz/critiquebrainz/pull…
      • 2022-08-16 22845, 2022

      • BrainzGit
        [critiquebrainz] release 03v-2022-08-16.0 has been published by 14github-actions[bot]: https://github.com/metabrainz/critiquebrainz/rele…
      • 2022-08-16 22853, 2022

      • alastairp
        Pratha-Fish: sure, let me just finish this release
      • 2022-08-16 22859, 2022

      • alastairp
        ansh: released
      • 2022-08-16 22800, 2022

      • alastairp
        Pratha-Fish: hi
      • 2022-08-16 22814, 2022

      • ansh
        alastairp: I got an error for series, which was reported to sentry, but it is working fine when I am opening it again
      • 2022-08-16 22806, 2022

      • alastairp
      • 2022-08-16 22830, 2022

      • alastairp
        because we changed the format of the cache for entity response rating key
      • 2022-08-16 22836, 2022

      • alastairp
        and I just cleared the redis cache
      • 2022-08-16 22819, 2022

      • Pratha-Fish
        alastairp: Hi there
      • 2022-08-16 22829, 2022

      • Pratha-Fish is back from dinner
      • 2022-08-16 22839, 2022

      • ansh
      • 2022-08-16 22803, 2022

      • alastairp
        ansh: that might have been just before I added to the enum
      • 2022-08-16 22820, 2022

      • ansh
        Okay
      • 2022-08-16 22838, 2022

      • alastairp
        Pratha-Fish: remind me of where we got to in the last few weeks. From what I remember, we were fixing some bugs in querying/sorting external APIs, and we got metadata matches for almost everything in the catalogue, but there are still some unmatched?
      • 2022-08-16 22822, 2022

      • Pratha-Fish
        alastairp: yes, that's right
      • 2022-08-16 22803, 2022

      • Pratha-Fish
        We were working on a script that took MLHD as input, and first cleaned the rec mbids to canonical mbids
      • 2022-08-16 22823, 2022

      • alastairp
        ah that's right, we had the standalone scripts, too
      • 2022-08-16 22827, 2022

      • Pratha-Fish
        and then looked up the canonical mbid using either MBC or the mapper APi
      • 2022-08-16 22857, 2022

      • Pratha-Fish
        The MBC looked a lot more accurate, so we went ahead with it and started debugging it
      • 2022-08-16 22851, 2022

      • Pratha-Fish
        However, I also noticed a lot more severe issue today
      • 2022-08-16 22829, 2022

      • alastairp
        and I remember that we also discussed loading the underlying MBC table into memory and directly looking up the data instead of using the API (to make it faster). Did you get anywhere with that?
      • 2022-08-16 22834, 2022

      • alastairp
        what issues have you found?
      • 2022-08-16 22858, 2022

      • Pratha-Fish
        alastairp: yes, the current version looks up MBC in memory :)
      • 2022-08-16 22802, 2022

      • Pratha-Fish
        The issue:
      • 2022-08-16 22819, 2022

      • Pratha-Fish
        When fetching canonical MBIDs for MLHD recording mbids, my script loaded 400k rows, and found canonical MBIDs for all but 12k rows
      • 2022-08-16 22832, 2022

      • Pratha-Fish
        e.g.
      • 2022-08-16 22850, 2022

      • Pratha-Fish
        This is the original MBID from MLHD: 07d193fb-d411-41fc-b9ee-9c5902de137a
      • 2022-08-16 22813, 2022

      • Pratha-Fish
        This is the canonical MBID according to my script: caeea2b6-f751-4ea8-9c9f-ac092a1b8f08
      • 2022-08-16 22833, 2022

      • Pratha-Fish
        However, I looked up the original mbid on musicbrainz.org/recording/07d193fb-d411-41fc-b9ee…
      • 2022-08-16 22843, 2022

      • Pratha-Fish
      • 2022-08-16 22805, 2022

      • alastairp
        right, I see it
      • 2022-08-16 22806, 2022

      • Pratha-Fish
        tldr: The rec-mbid cleanup aint working well in the first place
      • 2022-08-16 22801, 2022

      • Pratha-Fish
        ofc, the musicbrainz.org redirect is more accurate. Now I just need to figure out what went wrong in my script
      • 2022-08-16 22802, 2022

      • alastairp
        so, my guess is that this may be a bug in the sorting of recordings
      • 2022-08-16 22819, 2022

      • Pratha-Fish
        Can you elaborate?
      • 2022-08-16 22819, 2022

      • alastairp
        so I think this is a bug in the MBC data, not your scirpt
      • 2022-08-16 22828, 2022

      • Pratha-Fish
        Interesting
      • 2022-08-16 22848, 2022

      • alastairp
        so there can be many MBIDs in MB for the same combination of aritst and title
      • 2022-08-16 22805, 2022

      • alastairp
        we sort them to try and find the "earliest album release" and pick that as the mbid
      • 2022-08-16 22815, 2022

      • alastairp
        in fact, look at the ones that you posted
      • 2022-08-16 22828, 2022

      • alastairp
        https://musicbrainz.org/recording/caeea2b6-f751-4… (MBC): "Release group type: Album"
      • 2022-08-16 22845, 2022

      • alastairp
        and the redirected one in MB: https://musicbrainz.org/recording/294bf375-7bbd-4… "Release gorup type: EP"
      • 2022-08-16 22802, 2022

      • alastairp
        I remember that the MBC generation code prefers albums over EPs
      • 2022-08-16 22835, 2022

      • Pratha-Fish
        Also, I just noticed another thing
      • 2022-08-16 22829, 2022

      • alastairp
      • 2022-08-16 22830, 2022

      • alastairp
        Fallen is the debut studio album by American rock band Evanescence, released on March 4, 2003, by Wind-up and Epic Records.
      • 2022-08-16 22809, 2022

      • alastairp
        it looks like this was a B-side from an EP, which was included on some later re-releases of their first album
      • 2022-08-16 22815, 2022

      • alastairp
        that's a really interesting case
      • 2022-08-16 22814, 2022

      • Pratha-Fish
        Maintaining music metadata is much more interesting than people think, especially due to such cases lol
      • 2022-08-16 22829, 2022

      • alastairp
      • 2022-08-16 22840, 2022

      • alastairp
        https://musicbrainz.org/release/0b765eaf-d58e-48d… annotation: "Released by Amy Lee and Ben Moody in 1998 at a concert at Vino's in Little Rock, Arkansas, as a printed CD-R. The EP sold out that night"
      • 2022-08-16 22821, 2022

      • alastairp
        so, are the acoustic recordings from the brazil release of the album the same recordings from the 1998 live concert?
      • 2022-08-16 22838, 2022

      • alastairp
        unsure unless we listen to them
      • 2022-08-16 22858, 2022

      • alastairp
        that is, maybe the solution is to actually merge the recordings in the MB database instead
      • 2022-08-16 22800, 2022

      • Pratha-Fish
        Is that verified by acoustic fingerprinting?
      • 2022-08-16 22814, 2022

      • Pratha-Fish
        (as if I know what it actually is in the first place)
      • 2022-08-16 22820, 2022

      • alastairp
        we could look at the fingerprints if both recording mbids have one submitted
      • 2022-08-16 22846, 2022

      • Pratha-Fish
        Right
      • 2022-08-16 22804, 2022

      • alastairp
        this is in fact one of the things that mayhem and I discussed that we could do with the data - take a look at these differences in data and give a list to the community, saying "here is something that perhaps could be improved in the MB database"
      • 2022-08-16 22810, 2022

      • Pratha-Fish
        Yea, I remember you guys discussing that one :)
      • 2022-08-16 22806, 2022

      • Pratha-Fish
        As for the mapping issue that we were talking about,
      • 2022-08-16 22806, 2022

      • Pratha-Fish
        There's 2 stages in the mapping script:
      • 2022-08-16 22806, 2022

      • Pratha-Fish
        1) Cleaning MLHD_rec_mbid into MLHD_canonical_mbid (wherever possible)
      • 2022-08-16 22806, 2022

      • Pratha-Fish
        2) Using this MLHD_canonical_mbid to lookup rec_name and artist_credit_name
      • 2022-08-16 22806, 2022

      • Pratha-Fish
        Now, the 1st stage only uses MB_rec_redirect and MB_rec_canonical tables to lookup original rec_mbids and fetch their respective canonical MBIDs
      • 2022-08-16 22841, 2022

      • Pratha-Fish
        so, the 1st stage doesn't use MBC at all. Could it be still possibly a sorting issue?
      • 2022-08-16 22800, 2022

      • alastairp
        do you have code for the first stage that you can link to me?
      • 2022-08-16 22807, 2022

      • Pratha-Fish
        yep
      • 2022-08-16 22819, 2022

      • Pratha-Fish
      • 2022-08-16 22811, 2022

      • Pratha-Fish
        The 2 highlighted lines here just loop through the MLHD data, and replace any rec_mbids if they're found in the redirect table
      • 2022-08-16 22827, 2022

      • alastairp
        right, this is doing just the redirect table?
      • 2022-08-16 22832, 2022

      • Pratha-Fish
        It does the same thing again, but with the canonical rec mbid table
      • 2022-08-16 22846, 2022

      • alastairp
        ahh, I see
      • 2022-08-16 22826, 2022

      • alastairp
        so you do mbid -> mbid_gid_redirect and then do the canonical lookup, and then... use that mbid to get metadata, and look up mbc?
      • 2022-08-16 22846, 2022

      • Pratha-Fish
        exactly
      • 2022-08-16 22805, 2022

      • alastairp
        right. I think I see where some confusion may be
      • 2022-08-16 22844, 2022

      • alastairp
        our expectation is that canonical mbid table and mbc should probably give the same results in almost all cases, but to do this we could instead test:
      • 2022-08-16 22851, 2022

      • alastairp
        mbid -> mbid_gid_redirect -> canonical
      • 2022-08-16 22853, 2022

      • alastairp
        and separately
      • 2022-08-16 22802, 2022

      • alastairp
        mbid -> mbid_gid_redirect -> metadata -> mbc
      • 2022-08-16 22826, 2022

      • Pratha-Fish
        what does "metadata" here stand for?
      • 2022-08-16 22836, 2022

      • alastairp
        the artist credit name and recording title
      • 2022-08-16 22832, 2022

      • Pratha-Fish
        Looks like so far we have been using the mbid -> mbid_gid_redirect -> metadata -> mbc way
      • 2022-08-16 22843, 2022

      • Pratha-Fish
        and the metadata here is fetched using
      • 2022-08-16 22848, 2022

      • alastairp
        now, keep in mind the trick here that the canonical mbid mapping is actually based on the data from the mbc table, so I expect that the metadata will actually be the same regardless of if we look up redirect -> canonical -> metadata -> mbc or redirect -> metadata -> mbc
      • 2022-08-16 22820, 2022

      • alastairp
        so these 12k rows that you mentioned - are these from where the mbc lookup and the canonical lookup give different mbids?
      • 2022-08-16 22847, 2022

      • Pratha-Fish
        These 12k rows are just:
      • 2022-08-16 22847, 2022

      • Pratha-Fish
        mbid -> mbid_gid_redirect -> canonical_recording_redirect
      • 2022-08-16 22809, 2022

      • Pratha-Fish
        So, at this stage it doesn't do anything with MBC
      • 2022-08-16 22830, 2022

      • alastairp
        do you mean that 88k rows keep the same mbid, and 12k of them change?
      • 2022-08-16 22824, 2022

      • Pratha-Fish
        Basically:
      • 2022-08-16 22824, 2022

      • Pratha-Fish
        I load 400k rows -> check in mbid_gid_redirect -> check in canonical_recording_redirect
      • 2022-08-16 22857, 2022

      • Pratha-Fish
        After this "cleanup" process, only 12k rows are left out of 400k
      • 2022-08-16 22849, 2022

      • alastairp
        I don't know what you mean my "rows are left"
      • 2022-08-16 22839, 2022

      • Pratha-Fish
        12k rows (12k mbids) are deemed canonical, and the rest are just dropped for unknown reasons
      • 2022-08-16 22803, 2022

      • alastairp
        oh, you're _missing_ 380k rows?
      • 2022-08-16 22808, 2022

      • Pratha-Fish
        exactly
      • 2022-08-16 22813, 2022

      • Pratha-Fish
        Let me explain with an example
      • 2022-08-16 22814, 2022

      • alastairp
        oops, that doesn't sound good
      • 2022-08-16 22827, 2022

      • Pratha-Fish
        Right
      • 2022-08-16 22803, 2022

      • Pratha-Fish
        I must be doing something incredibly stupid in the cleanup process. Because our earlier tests showed almost all recording MBIDs were good to go without cleanup
      • 2022-08-16 22833, 2022

      • alastairp
        yes, right
      • 2022-08-16 22841, 2022

      • Pratha-Fish
      • 2022-08-16 22843, 2022

      • alastairp
        and that is when running this script?
      • 2022-08-16 22849, 2022

      • Rotab has quit