#metabrainz

/

14:33 PM
BrainzGit

[critiquebrainz] 14alastair merged pull request #457 (03master…add_avg_rating_total_count): feat: Return average ratings count in API https://github.com/metabrainz/critiquebrainz/pull…

2022-08-16 22835, 2022

14:34 PM
ansh

alastairp: the changes for series and author were already there in 452 right?

2022-08-16 22856, 2022

14:34 PM
alastairp

yes, I had a look and I saw the commits there

2022-08-16 22820, 2022

14:35 PM
ansh

So we did not have to merge 453 and 455

2022-08-16 22833, 2022

14:35 PM
ansh

The commits would be repeated

2022-08-16 22855, 2022

14:35 PM
alastairp

correct. in fact, I only merged 452, and github automatically closed 453 and 455

2022-08-16 22809, 2022

14:36 PM
alastairp

because it saw that their commits also existed in `master`

2022-08-16 22846, 2022

14:36 PM
ansh

Oh I thought you merged all three

2022-08-16 22854, 2022

14:36 PM
alastairp

nope, magic

2022-08-16 22824, 2022

14:37 PM
yvanzo

al-bondigas: It should not require any change to files provided by git. If you made any changes, revert these.

2022-08-16 22838, 2022

14:37 PM
alastairp

ansh: did you move 456 back to draft? and 458 is pending further work?

2022-08-16 22828, 2022

14:38 PM
ansh

For 456, I'll quickly rebase and make it for all the entity types

2022-08-16 22824, 2022

14:39 PM
ansh

For 458, The error message is not getting translated

2022-08-16 22806, 2022

14:40 PM
alastairp

that sounds like a bug

2022-08-16 22810, 2022

14:40 PM
ansh

Since it is an API, we don't have an option to translate it

2022-08-16 22818, 2022

14:40 PM
ansh

We are not providing any language to it

2022-08-16 22820, 2022

14:40 PM
alastairp

ahhh I thought it was a user message

2022-08-16 22833, 2022

14:40 PM
alastairp

my mistake. no problem about the translation, then

2022-08-16 22801, 2022

14:41 PM
alastairp

how about the issue of encountering a server error if we forget to update the mapping?

2022-08-16 22826, 2022

14:41 PM
alastairp

I think that just using a generic word in this error is OK. it's been like this since when we added other entities to CB

2022-08-16 22839, 2022

14:41 PM
ansh

For adding a new entity, we already have 4-5 mappings iirc

2022-08-16 22846, 2022

14:42 PM
alastairp

I wonder if we could move these all to one place

2022-08-16 22834, 2022

14:43 PM
ansh

The generic word also sounds good, but in CB-444, monkey mentioned "Is there a way we can use the type of entity we are submitting a review for ?", so I did it in that way

2022-08-16 22835, 2022

14:43 PM
BrainzBot

CB-444: API error messages are music-specific https://tickets.metabrainz.org/browse/CB-444

2022-08-16 22844, 2022

14:43 PM
alastairp

https://github.com/metabrainz/critiquebrainz/blob…

2022-08-16 22840, 2022

14:44 PM
alastairp

ah, interesting. this error is passed through directly to the BB user from the CB API

2022-08-16 22812, 2022

14:45 PM
ansh

I think it is also being done in LB

2022-08-16 22834, 2022

14:45 PM
alastairp

even in this case, I don't see a problem with saying "item" instead of album

2022-08-16 22843, 2022

14:45 PM
alastairp

or "entity"

2022-08-16 22808, 2022

14:47 PM
alastairp

but in the case that we do want to show the actual entity type, we absolutely should have only one mapping of entity types to names, and ideally it should be in an easy to find place (the link that I just posted is a bad place)

2022-08-16 22831, 2022

14:47 PM
alastairp

maybe we should put both the type and the mapping here: https://github.com/metabrainz/critiquebrainz/blob…

2022-08-16 22812, 2022

14:48 PM
ansh

Agreed

2022-08-16 22856, 2022

14:49 PM
alastairp

I don't know what the behaviour of gettext is if we try and use a translated value in the API without first setting it up. perhaps we can have a plain dict in review.py, and then wrap the values in gettext for the version in user.py

2022-08-16 22814, 2022

14:50 PM
alastairp

though it's not clear if gettext will generate the .po files correctly if we call gettext in a loop, rather than passing it a constant

2022-08-16 22858, 2022

14:50 PM
alastairp

let's put this one off for now too, then

2022-08-16 22819, 2022

14:51 PM
alastairp

do you want to include CB#456 in today's release?

2022-08-16 22820, 2022

14:51 PM
BrainzBot

Handle Deleted and Redirected Edition Groups: https://github.com/metabrainz/critiquebrainz/pull…

2022-08-16 22838, 2022

14:51 PM
ansh

Yes

2022-08-16 22858, 2022

14:51 PM
ansh

I am done with the changes, just testing it before pushing

2022-08-16 22811, 2022

14:52 PM
alastairp

great

2022-08-16 22805, 2022

15:00 PM
Pratha-Fish

alastairp: Hi, are you currently free for a discussion?

2022-08-16 22859, 2022

15:01 PM
ansh

alastairp: Pushed the changes

2022-08-16 22836, 2022

15:03 PM
yvanzo

al-bondigas: I just tested creating a database from the current master HEAD which is tagged v-2022-08-09 and it worked flawlessly, so It isn’t a regression.

2022-08-16 22841, 2022

15:07 PM
yvanzo

al-bondigas: Since it is an issue with your setup, I recommend you to restart from scratch after having cleared everything as follows: sudo docker-compose down --remove-orphans --rmi all --volumes && cd .. && rm -fr musicbrainz-docker

2022-08-16 22816, 2022

15:18 PM
BrainzGit

[critiquebrainz] 14alastair merged pull request #456 (03master…handle_deleted_and_redirected_entities): Handle Deleted and Redirected BB Entities https://github.com/metabrainz/critiquebrainz/pull…

2022-08-16 22845, 2022

15:18 PM
BrainzGit

[critiquebrainz] release 03v-2022-08-16.0 has been published by 14github-actions[bot]: https://github.com/metabrainz/critiquebrainz/rele…

2022-08-16 22853, 2022

15:18 PM
alastairp

Pratha-Fish: sure, let me just finish this release

2022-08-16 22859, 2022

15:34 PM
alastairp

ansh: released

2022-08-16 22800, 2022

15:36 PM
alastairp

Pratha-Fish: hi

2022-08-16 22814, 2022

15:38 PM
ansh

alastairp: I got an error for series, which was reported to sentry, but it is working fine when I am opening it again

2022-08-16 22806, 2022

15:39 PM
alastairp

ansh: I saw this error: https://sentry.metabrainz.org/organizations/metab…

2022-08-16 22830, 2022

15:39 PM
alastairp

because we changed the format of the cache for entity response rating key

2022-08-16 22836, 2022

15:39 PM
alastairp

and I just cleared the redis cache

2022-08-16 22819, 2022

15:41 PM
Pratha-Fish

alastairp: Hi there

2022-08-16 22829, 2022

15:41 PM
Pratha-Fish is back from dinner

2022-08-16 22839, 2022

15:41 PM
ansh

https://sentry.metabrainz.org/organizations/metab…

2022-08-16 22803, 2022

15:42 PM
alastairp

ansh: that might have been just before I added to the enum

2022-08-16 22820, 2022

15:43 PM
ansh

Okay

2022-08-16 22838, 2022

15:46 PM
alastairp

Pratha-Fish: remind me of where we got to in the last few weeks. From what I remember, we were fixing some bugs in querying/sorting external APIs, and we got metadata matches for almost everything in the catalogue, but there are still some unmatched?

2022-08-16 22822, 2022

15:47 PM
Pratha-Fish

alastairp: yes, that's right

2022-08-16 22803, 2022

15:48 PM
Pratha-Fish

We were working on a script that took MLHD as input, and first cleaned the rec mbids to canonical mbids

2022-08-16 22823, 2022

15:48 PM
alastairp

ah that's right, we had the standalone scripts, too

2022-08-16 22827, 2022

15:48 PM
Pratha-Fish

and then looked up the canonical mbid using either MBC or the mapper APi

2022-08-16 22857, 2022

15:48 PM
Pratha-Fish

The MBC looked a lot more accurate, so we went ahead with it and started debugging it

2022-08-16 22851, 2022

15:49 PM
Pratha-Fish

However, I also noticed a lot more severe issue today

2022-08-16 22829, 2022

15:50 PM
alastairp

and I remember that we also discussed loading the underlying MBC table into memory and directly looking up the data instead of using the API (to make it faster). Did you get anywhere with that?

2022-08-16 22834, 2022

15:50 PM
alastairp

what issues have you found?

2022-08-16 22858, 2022

15:50 PM
Pratha-Fish

alastairp: yes, the current version looks up MBC in memory :)

2022-08-16 22802, 2022

15:51 PM
Pratha-Fish

The issue:

2022-08-16 22819, 2022

15:51 PM
Pratha-Fish

When fetching canonical MBIDs for MLHD recording mbids, my script loaded 400k rows, and found canonical MBIDs for all but 12k rows

2022-08-16 22832, 2022

15:51 PM
Pratha-Fish

e.g.

2022-08-16 22850, 2022

15:51 PM
Pratha-Fish

This is the original MBID from MLHD: 07d193fb-d411-41fc-b9ee-9c5902de137a

2022-08-16 22813, 2022

15:52 PM
Pratha-Fish

This is the canonical MBID according to my script: caeea2b6-f751-4ea8-9c9f-ac092a1b8f08

2022-08-16 22833, 2022

15:52 PM
Pratha-Fish

However, I looked up the original mbid on musicbrainz.org/recording/07d193fb-d411-41fc-b9ee…

2022-08-16 22843, 2022

15:52 PM
Pratha-Fish

And it redirected to https://musicbrainz.org/recording/294bf375-7bbd-4…

2022-08-16 22805, 2022

15:53 PM
alastairp

right, I see it

2022-08-16 22806, 2022

15:53 PM
Pratha-Fish

tldr: The rec-mbid cleanup aint working well in the first place

2022-08-16 22801, 2022

15:54 PM
Pratha-Fish

ofc, the musicbrainz.org redirect is more accurate. Now I just need to figure out what went wrong in my script

2022-08-16 22802, 2022

15:54 PM
alastairp

so, my guess is that this may be a bug in the sorting of recordings

2022-08-16 22819, 2022

15:54 PM
Pratha-Fish

Can you elaborate?

2022-08-16 22819, 2022

15:54 PM
alastairp

so I think this is a bug in the MBC data, not your scirpt

2022-08-16 22828, 2022

15:54 PM
Pratha-Fish

Interesting

2022-08-16 22848, 2022

15:54 PM
alastairp

so there can be many MBIDs in MB for the same combination of aritst and title

2022-08-16 22805, 2022

15:55 PM
alastairp

we sort them to try and find the "earliest album release" and pick that as the mbid

2022-08-16 22815, 2022

15:55 PM
alastairp

in fact, look at the ones that you posted

2022-08-16 22828, 2022

15:55 PM
alastairp

https://musicbrainz.org/recording/caeea2b6-f751-4… (MBC): "Release group type: Album"

2022-08-16 22845, 2022

15:55 PM
alastairp

and the redirected one in MB: https://musicbrainz.org/recording/294bf375-7bbd-4… "Release gorup type: EP"

2022-08-16 22802, 2022

15:56 PM
alastairp

I remember that the MBC generation code prefers albums over EPs

2022-08-16 22835, 2022

15:56 PM
Pratha-Fish

Also, I just noticed another thing

2022-08-16 22829, 2022

15:57 PM
alastairp

https://en.wikipedia.org/wiki/Fallen_(Evanescence…

2022-08-16 22830, 2022

15:57 PM
alastairp

Fallen is the debut studio album by American rock band Evanescence, released on March 4, 2003, by Wind-up and Epic Records.

2022-08-16 22809, 2022

15:58 PM
alastairp

it looks like this was a B-side from an EP, which was included on some later re-releases of their first album

2022-08-16 22815, 2022

15:58 PM
alastairp

that's a really interesting case

2022-08-16 22814, 2022

15:59 PM
Pratha-Fish

Maintaining music metadata is much more interesting than people think, especially due to such cases lol

2022-08-16 22829, 2022

15:59 PM
alastairp

https://musicbrainz.org/release/b6d2ff2c-9992-4f6… CD title: "CD 2: Acoustic"

2022-08-16 22840, 2022

15:59 PM
alastairp

https://musicbrainz.org/release/0b765eaf-d58e-48d… annotation: "Released by Amy Lee and Ben Moody in 1998 at a concert at Vino's in Little Rock, Arkansas, as a printed CD-R. The EP sold out that night"

2022-08-16 22821, 2022

16:00 PM
alastairp

so, are the acoustic recordings from the brazil release of the album the same recordings from the 1998 live concert?

2022-08-16 22838, 2022

16:00 PM
alastairp

unsure unless we listen to them

2022-08-16 22858, 2022

16:00 PM
alastairp

that is, maybe the solution is to actually merge the recordings in the MB database instead

2022-08-16 22800, 2022

16:01 PM
Pratha-Fish

Is that verified by acoustic fingerprinting?

2022-08-16 22814, 2022

16:01 PM
Pratha-Fish

(as if I know what it actually is in the first place)

2022-08-16 22820, 2022

16:01 PM
alastairp

we could look at the fingerprints if both recording mbids have one submitted

2022-08-16 22846, 2022

16:01 PM
Pratha-Fish

Right

2022-08-16 22804, 2022

16:02 PM
alastairp

this is in fact one of the things that mayhem and I discussed that we could do with the data - take a look at these differences in data and give a list to the community, saying "here is something that perhaps could be improved in the MB database"

2022-08-16 22810, 2022

16:03 PM
Pratha-Fish

Yea, I remember you guys discussing that one :)

2022-08-16 22806, 2022

16:04 PM
Pratha-Fish

As for the mapping issue that we were talking about,

2022-08-16 22806, 2022

16:04 PM
Pratha-Fish

There's 2 stages in the mapping script:

2022-08-16 22806, 2022

16:04 PM
Pratha-Fish

1) Cleaning MLHD_rec_mbid into MLHD_canonical_mbid (wherever possible)

2022-08-16 22806, 2022

16:04 PM
Pratha-Fish

2) Using this MLHD_canonical_mbid to lookup rec_name and artist_credit_name

2022-08-16 22806, 2022

16:04 PM
Pratha-Fish

Now, the 1st stage only uses MB_rec_redirect and MB_rec_canonical tables to lookup original rec_mbids and fetch their respective canonical MBIDs

2022-08-16 22841, 2022

16:04 PM
Pratha-Fish

so, the 1st stage doesn't use MBC at all. Could it be still possibly a sorting issue?

2022-08-16 22800, 2022

16:05 PM
alastairp

do you have code for the first stage that you can link to me?

2022-08-16 22807, 2022

16:05 PM
Pratha-Fish

yep

2022-08-16 22819, 2022

16:07 PM
Pratha-Fish

alastairp: https://github.com/Prathamesh-Ghatole/MLHD/blob/d…

2022-08-16 22811, 2022

16:08 PM
Pratha-Fish

The 2 highlighted lines here just loop through the MLHD data, and replace any rec_mbids if they're found in the redirect table

2022-08-16 22827, 2022

16:08 PM
alastairp

right, this is doing just the redirect table?

2022-08-16 22832, 2022

16:08 PM
Pratha-Fish

It does the same thing again, but with the canonical rec mbid table

2022-08-16 22846, 2022

16:08 PM
alastairp

ahh, I see

2022-08-16 22826, 2022

16:09 PM
alastairp

so you do mbid -> mbid_gid_redirect and then do the canonical lookup, and then... use that mbid to get metadata, and look up mbc?

2022-08-16 22846, 2022

16:09 PM
Pratha-Fish

exactly

2022-08-16 22805, 2022

16:10 PM
alastairp

right. I think I see where some confusion may be

2022-08-16 22844, 2022

16:10 PM
alastairp

our expectation is that canonical mbid table and mbc should probably give the same results in almost all cases, but to do this we could instead test:

2022-08-16 22851, 2022

16:10 PM
alastairp

mbid -> mbid_gid_redirect -> canonical

2022-08-16 22853, 2022

16:10 PM
alastairp

and separately

2022-08-16 22802, 2022

16:11 PM
alastairp

mbid -> mbid_gid_redirect -> metadata -> mbc

2022-08-16 22826, 2022

16:11 PM
Pratha-Fish

what does "metadata" here stand for?

2022-08-16 22836, 2022

16:11 PM
alastairp

the artist credit name and recording title

2022-08-16 22832, 2022

16:12 PM
Pratha-Fish

Looks like so far we have been using the mbid -> mbid_gid_redirect -> metadata -> mbc way

2022-08-16 22843, 2022

16:12 PM
Pratha-Fish

and the metadata here is fetched using

2022-08-16 22848, 2022

16:12 PM
alastairp

now, keep in mind the trick here that the canonical mbid mapping is actually based on the data from the mbc table, so I expect that the metadata will actually be the same regardless of if we look up redirect -> canonical -> metadata -> mbc or redirect -> metadata -> mbc

2022-08-16 22820, 2022

16:13 PM
alastairp

so these 12k rows that you mentioned - are these from where the mbc lookup and the canonical lookup give different mbids?

2022-08-16 22847, 2022

16:14 PM
Pratha-Fish

These 12k rows are just:

2022-08-16 22847, 2022

16:14 PM
Pratha-Fish

mbid -> mbid_gid_redirect -> canonical_recording_redirect

2022-08-16 22809, 2022

16:15 PM
Pratha-Fish

So, at this stage it doesn't do anything with MBC

2022-08-16 22830, 2022

16:15 PM
alastairp

do you mean that 88k rows keep the same mbid, and 12k of them change?

2022-08-16 22824, 2022

16:16 PM
Pratha-Fish

Basically:

2022-08-16 22824, 2022

16:16 PM
Pratha-Fish

I load 400k rows -> check in mbid_gid_redirect -> check in canonical_recording_redirect

2022-08-16 22857, 2022

16:16 PM
Pratha-Fish

After this "cleanup" process, only 12k rows are left out of 400k

2022-08-16 22849, 2022

16:17 PM
alastairp

I don't know what you mean my "rows are left"

2022-08-16 22839, 2022

16:18 PM
Pratha-Fish

12k rows (12k mbids) are deemed canonical, and the rest are just dropped for unknown reasons

2022-08-16 22803, 2022

16:19 PM
alastairp

oh, you're _missing_ 380k rows?

2022-08-16 22808, 2022

16:19 PM
Pratha-Fish

exactly

2022-08-16 22813, 2022

16:19 PM
Pratha-Fish

Let me explain with an example

2022-08-16 22814, 2022

16:19 PM
alastairp

oops, that doesn't sound good

2022-08-16 22827, 2022

16:19 PM
Pratha-Fish

Right

2022-08-16 22803, 2022

16:20 PM
Pratha-Fish

I must be doing something incredibly stupid in the cleanup process. Because our earlier tests showed almost all recording MBIDs were good to go without cleanup

2022-08-16 22833, 2022

16:20 PM
alastairp

yes, right

2022-08-16 22841, 2022

16:20 PM
Pratha-Fish

https://usercontent.irccloud-cdn.com/file/SYZsrJb…

2022-08-16 22843, 2022

16:20 PM
alastairp

and that is when running this script?

2022-08-16 22849, 2022

16:20 PM
Rotab has quit