#metabrainz

/

      • BrainzGit
        [critiquebrainz] 14alastair merged pull request #457 (03master…add_avg_rating_total_count): feat: Return average ratings count in API https://github.com/metabrainz/critiquebrainz/pu...
      • ansh
        alastairp: the changes for series and author were already there in 452 right?
      • alastairp
        yes, I had a look and I saw the commits there
      • ansh
        So we did not have to merge 453 and 455
      • The commits would be repeated
      • alastairp
        correct. in fact, I only merged 452, and github automatically closed 453 and 455
      • because it saw that their commits also existed in `master`
      • ansh
        Oh I thought you merged all three
      • alastairp
        nope, magic
      • yvanzo
        al-bondigas: It should not require any change to files provided by git. If you made any changes, revert these.
      • alastairp
        ansh: did you move 456 back to draft? and 458 is pending further work?
      • ansh
        For 456, I'll quickly rebase and make it for all the entity types
      • For 458, The error message is not getting translated
      • alastairp
        that sounds like a bug
      • ansh
        Since it is an API, we don't have an option to translate it
      • We are not providing any language to it
      • alastairp
        ahhh I thought it was a user message
      • my mistake. no problem about the translation, then
      • how about the issue of encountering a server error if we forget to update the mapping?
      • I think that just using a generic word in this error is OK. it's been like this since when we added other entities to CB
      • ansh
        For adding a new entity, we already have 4-5 mappings iirc
      • alastairp
        I wonder if we could move these all to one place
      • ansh
        The generic word also sounds good, but in CB-444, monkey mentioned "Is there a way we can use the type of entity we are submitting a review for ?", so I did it in that way
      • BrainzBot
        CB-444: API error messages are music-specific https://tickets.metabrainz.org/browse/CB-444
      • alastairp
      • ah, interesting. this error is passed through directly to the BB user from the CB API
      • ansh
        I think it is also being done in LB
      • alastairp
        even in this case, I don't see a problem with saying "item" instead of album
      • or "entity"
      • but in the case that we do want to show the actual entity type, we absolutely should have only one mapping of entity types to names, and ideally it should be in an easy to find place (the link that I just posted is a bad place)
      • maybe we should put both the type and the mapping here: https://github.com/metabrainz/critiquebrainz/bl...
      • ansh
        Agreed
      • alastairp
        I don't know what the behaviour of gettext is if we try and use a translated value in the API without first setting it up. perhaps we can have a plain dict in review.py, and then wrap the values in gettext for the version in user.py
      • though it's not clear if gettext will generate the .po files correctly if we call gettext in a loop, rather than passing it a constant
      • let's put this one off for now too, then
      • do you want to include CB#456 in today's release?
      • BrainzBot
        Handle Deleted and Redirected Edition Groups: https://github.com/metabrainz/critiquebrainz/pu...
      • ansh
        Yes
      • I am done with the changes, just testing it before pushing
      • alastairp
        great
      • Pratha-Fish
        alastairp: Hi, are you currently free for a discussion?
      • ansh
        alastairp: Pushed the changes
      • yvanzo
        al-bondigas: I just tested creating a database from the current master HEAD which is tagged v-2022-08-09 and it worked flawlessly, so It isn’t a regression.
      • al-bondigas: Since it is an issue with your setup, I recommend you to restart from scratch after having cleared everything as follows: sudo docker-compose down --remove-orphans --rmi all --volumes && cd .. && rm -fr musicbrainz-docker
      • BrainzGit
        [critiquebrainz] 14alastair merged pull request #456 (03master…handle_deleted_and_redirected_entities): Handle Deleted and Redirected BB Entities https://github.com/metabrainz/critiquebrainz/pu...
      • [critiquebrainz] release 03v-2022-08-16.0 has been published by 14github-actions[bot]: https://github.com/metabrainz/critiquebrainz/re...
      • alastairp
        Pratha-Fish: sure, let me just finish this release
      • ansh: released
      • Pratha-Fish: hi
      • ansh
        alastairp: I got an error for series, which was reported to sentry, but it is working fine when I am opening it again
      • alastairp
      • because we changed the format of the cache for entity response rating key
      • and I just cleared the redis cache
      • Pratha-Fish
        alastairp: Hi there
      • Pratha-Fish is back from dinner
      • ansh
      • alastairp
        ansh: that might have been just before I added to the enum
      • ansh
        Okay
      • alastairp
        Pratha-Fish: remind me of where we got to in the last few weeks. From what I remember, we were fixing some bugs in querying/sorting external APIs, and we got metadata matches for almost everything in the catalogue, but there are still some unmatched?
      • Pratha-Fish
        alastairp: yes, that's right
      • We were working on a script that took MLHD as input, and first cleaned the rec mbids to canonical mbids
      • alastairp
        ah that's right, we had the standalone scripts, too
      • Pratha-Fish
        and then looked up the canonical mbid using either MBC or the mapper APi
      • The MBC looked a lot more accurate, so we went ahead with it and started debugging it
      • However, I also noticed a lot more severe issue today
      • alastairp
        and I remember that we also discussed loading the underlying MBC table into memory and directly looking up the data instead of using the API (to make it faster). Did you get anywhere with that?
      • what issues have you found?
      • Pratha-Fish
        alastairp: yes, the current version looks up MBC in memory :)
      • The issue:
      • When fetching canonical MBIDs for MLHD recording mbids, my script loaded 400k rows, and found canonical MBIDs for all but 12k rows
      • e.g.
      • This is the original MBID from MLHD: 07d193fb-d411-41fc-b9ee-9c5902de137a
      • This is the canonical MBID according to my script: caeea2b6-f751-4ea8-9c9f-ac092a1b8f08
      • However, I looked up the original mbid on musicbrainz.org/recording/07d193fb-d411-41fc-b9...
      • alastairp
        right, I see it
      • Pratha-Fish
        tldr: The rec-mbid cleanup aint working well in the first place
      • ofc, the musicbrainz.org redirect is more accurate. Now I just need to figure out what went wrong in my script
      • alastairp
        so, my guess is that this may be a bug in the sorting of recordings
      • Pratha-Fish
        Can you elaborate?
      • alastairp
        so I think this is a bug in the MBC data, not your scirpt
      • Pratha-Fish
        Interesting
      • alastairp
        so there can be many MBIDs in MB for the same combination of aritst and title
      • we sort them to try and find the "earliest album release" and pick that as the mbid
      • in fact, look at the ones that you posted
      • https://musicbrainz.org/recording/caeea2b6-f751... (MBC): "Release group type: Album"
      • and the redirected one in MB: https://musicbrainz.org/recording/294bf375-7bbd... "Release gorup type: EP"
      • I remember that the MBC generation code prefers albums over EPs
      • Pratha-Fish
        Also, I just noticed another thing
      • alastairp
      • Fallen is the debut studio album by American rock band Evanescence, released on March 4, 2003, by Wind-up and Epic Records.
      • it looks like this was a B-side from an EP, which was included on some later re-releases of their first album
      • that's a really interesting case
      • Pratha-Fish
        Maintaining music metadata is much more interesting than people think, especially due to such cases lol
      • alastairp
      • https://musicbrainz.org/release/0b765eaf-d58e-4... annotation: "Released by Amy Lee and Ben Moody in 1998 at a concert at Vino's in Little Rock, Arkansas, as a printed CD-R. The EP sold out that night"
      • so, are the acoustic recordings from the brazil release of the album the same recordings from the 1998 live concert?
      • unsure unless we listen to them
      • that is, maybe the solution is to actually merge the recordings in the MB database instead
      • Pratha-Fish
        Is that verified by acoustic fingerprinting?
      • (as if I know what it actually is in the first place)
      • alastairp
        we could look at the fingerprints if both recording mbids have one submitted
      • Pratha-Fish
        Right
      • alastairp
        this is in fact one of the things that mayhem and I discussed that we could do with the data - take a look at these differences in data and give a list to the community, saying "here is something that perhaps could be improved in the MB database"
      • Pratha-Fish
        Yea, I remember you guys discussing that one :)
      • As for the mapping issue that we were talking about,
      • There's 2 stages in the mapping script:
      • 1) Cleaning MLHD_rec_mbid into MLHD_canonical_mbid (wherever possible)
      • 2) Using this MLHD_canonical_mbid to lookup rec_name and artist_credit_name
      • Now, the 1st stage only uses MB_rec_redirect and MB_rec_canonical tables to lookup original rec_mbids and fetch their respective canonical MBIDs
      • so, the 1st stage doesn't use MBC at all. Could it be still possibly a sorting issue?
      • alastairp
        do you have code for the first stage that you can link to me?
      • Pratha-Fish
        yep
      • The 2 highlighted lines here just loop through the MLHD data, and replace any rec_mbids if they're found in the redirect table
      • alastairp
        right, this is doing just the redirect table?
      • Pratha-Fish
        It does the same thing again, but with the canonical rec mbid table
      • alastairp
        ahh, I see
      • so you do mbid -> mbid_gid_redirect and then do the canonical lookup, and then... use that mbid to get metadata, and look up mbc?
      • Pratha-Fish
        exactly
      • alastairp
        right. I think I see where some confusion may be
      • our expectation is that canonical mbid table and mbc should probably give the same results in almost all cases, but to do this we could instead test:
      • mbid -> mbid_gid_redirect -> canonical
      • and separately
      • mbid -> mbid_gid_redirect -> metadata -> mbc
      • Pratha-Fish
        what does "metadata" here stand for?
      • alastairp
        the artist credit name and recording title
      • Pratha-Fish
        Looks like so far we have been using the mbid -> mbid_gid_redirect -> metadata -> mbc way
      • and the metadata here is fetched using
      • alastairp
        now, keep in mind the trick here that the canonical mbid mapping is actually based on the data from the mbc table, so I expect that the metadata will actually be the same regardless of if we look up redirect -> canonical -> metadata -> mbc or redirect -> metadata -> mbc
      • so these 12k rows that you mentioned - are these from where the mbc lookup and the canonical lookup give different mbids?
      • Pratha-Fish
        These 12k rows are just:
      • mbid -> mbid_gid_redirect -> canonical_recording_redirect
      • So, at this stage it doesn't do anything with MBC
      • alastairp
        do you mean that 88k rows keep the same mbid, and 12k of them change?
      • Pratha-Fish
        Basically:
      • I load 400k rows -> check in mbid_gid_redirect -> check in canonical_recording_redirect
      • After this "cleanup" process, only 12k rows are left out of 400k
      • alastairp
        I don't know what you mean my "rows are left"
      • Pratha-Fish
        12k rows (12k mbids) are deemed canonical, and the rest are just dropped for unknown reasons
      • alastairp
        oh, you're _missing_ 380k rows?
      • Pratha-Fish
        exactly
      • Let me explain with an example
      • alastairp
        oops, that doesn't sound good
      • Pratha-Fish
        Right
      • I must be doing something incredibly stupid in the cleanup process. Because our earlier tests showed almost all recording MBIDs were good to go without cleanup
      • alastairp
        yes, right
      • Pratha-Fish
      • alastairp
        and that is when running this script?
      • Rotab has quit