alastairp: the changes for series and author were already there in 452 right?
2022-08-16 22856, 2022
alastairp
yes, I had a look and I saw the commits there
2022-08-16 22820, 2022
ansh
So we did not have to merge 453 and 455
2022-08-16 22833, 2022
ansh
The commits would be repeated
2022-08-16 22855, 2022
alastairp
correct. in fact, I only merged 452, and github automatically closed 453 and 455
2022-08-16 22809, 2022
alastairp
because it saw that their commits also existed in `master`
2022-08-16 22846, 2022
ansh
Oh I thought you merged all three
2022-08-16 22854, 2022
alastairp
nope, magic
2022-08-16 22824, 2022
yvanzo
al-bondigas: It should not require any change to files provided by git. If you made any changes, revert these.
2022-08-16 22838, 2022
alastairp
ansh: did you move 456 back to draft? and 458 is pending further work?
2022-08-16 22828, 2022
ansh
For 456, I'll quickly rebase and make it for all the entity types
2022-08-16 22824, 2022
ansh
For 458, The error message is not getting translated
2022-08-16 22806, 2022
alastairp
that sounds like a bug
2022-08-16 22810, 2022
ansh
Since it is an API, we don't have an option to translate it
2022-08-16 22818, 2022
ansh
We are not providing any language to it
2022-08-16 22820, 2022
alastairp
ahhh I thought it was a user message
2022-08-16 22833, 2022
alastairp
my mistake. no problem about the translation, then
2022-08-16 22801, 2022
alastairp
how about the issue of encountering a server error if we forget to update the mapping?
2022-08-16 22826, 2022
alastairp
I think that just using a generic word in this error is OK. it's been like this since when we added other entities to CB
2022-08-16 22839, 2022
ansh
For adding a new entity, we already have 4-5 mappings iirc
2022-08-16 22846, 2022
alastairp
I wonder if we could move these all to one place
2022-08-16 22834, 2022
ansh
The generic word also sounds good, but in CB-444, monkey mentioned "Is there a way we can use the type of entity we are submitting a review for ?", so I did it in that way
ah, interesting. this error is passed through directly to the BB user from the CB API
2022-08-16 22812, 2022
ansh
I think it is also being done in LB
2022-08-16 22834, 2022
alastairp
even in this case, I don't see a problem with saying "item" instead of album
2022-08-16 22843, 2022
alastairp
or "entity"
2022-08-16 22808, 2022
alastairp
but in the case that we do want to show the actual entity type, we absolutely should have only one mapping of entity types to names, and ideally it should be in an easy to find place (the link that I just posted is a bad place)
I don't know what the behaviour of gettext is if we try and use a translated value in the API without first setting it up. perhaps we can have a plain dict in review.py, and then wrap the values in gettext for the version in user.py
2022-08-16 22814, 2022
alastairp
though it's not clear if gettext will generate the .po files correctly if we call gettext in a loop, rather than passing it a constant
I am done with the changes, just testing it before pushing
2022-08-16 22811, 2022
alastairp
great
2022-08-16 22805, 2022
Pratha-Fish
alastairp: Hi, are you currently free for a discussion?
2022-08-16 22859, 2022
ansh
alastairp: Pushed the changes
2022-08-16 22836, 2022
yvanzo
al-bondigas: I just tested creating a database from the current master HEAD which is tagged v-2022-08-09 and it worked flawlessly, so It isn’t a regression.
2022-08-16 22841, 2022
yvanzo
al-bondigas: Since it is an issue with your setup, I recommend you to restart from scratch after having cleared everything as follows: sudo docker-compose down --remove-orphans --rmi all --volumes && cd .. && rm -fr musicbrainz-docker
ansh: that might have been just before I added to the enum
2022-08-16 22820, 2022
ansh
Okay
2022-08-16 22838, 2022
alastairp
Pratha-Fish: remind me of where we got to in the last few weeks. From what I remember, we were fixing some bugs in querying/sorting external APIs, and we got metadata matches for almost everything in the catalogue, but there are still some unmatched?
2022-08-16 22822, 2022
Pratha-Fish
alastairp: yes, that's right
2022-08-16 22803, 2022
Pratha-Fish
We were working on a script that took MLHD as input, and first cleaned the rec mbids to canonical mbids
2022-08-16 22823, 2022
alastairp
ah that's right, we had the standalone scripts, too
2022-08-16 22827, 2022
Pratha-Fish
and then looked up the canonical mbid using either MBC or the mapper APi
2022-08-16 22857, 2022
Pratha-Fish
The MBC looked a lot more accurate, so we went ahead with it and started debugging it
2022-08-16 22851, 2022
Pratha-Fish
However, I also noticed a lot more severe issue today
2022-08-16 22829, 2022
alastairp
and I remember that we also discussed loading the underlying MBC table into memory and directly looking up the data instead of using the API (to make it faster). Did you get anywhere with that?
2022-08-16 22834, 2022
alastairp
what issues have you found?
2022-08-16 22858, 2022
Pratha-Fish
alastairp: yes, the current version looks up MBC in memory :)
2022-08-16 22802, 2022
Pratha-Fish
The issue:
2022-08-16 22819, 2022
Pratha-Fish
When fetching canonical MBIDs for MLHD recording mbids, my script loaded 400k rows, and found canonical MBIDs for all but 12k rows
2022-08-16 22832, 2022
Pratha-Fish
e.g.
2022-08-16 22850, 2022
Pratha-Fish
This is the original MBID from MLHD: 07d193fb-d411-41fc-b9ee-9c5902de137a
2022-08-16 22813, 2022
Pratha-Fish
This is the canonical MBID according to my script: caeea2b6-f751-4ea8-9c9f-ac092a1b8f08
https://musicbrainz.org/release/0b765eaf-d58e-48d… annotation: "Released by Amy Lee and Ben Moody in 1998 at a concert at Vino's in Little Rock, Arkansas, as a printed CD-R. The EP sold out that night"
2022-08-16 22821, 2022
alastairp
so, are the acoustic recordings from the brazil release of the album the same recordings from the 1998 live concert?
2022-08-16 22838, 2022
alastairp
unsure unless we listen to them
2022-08-16 22858, 2022
alastairp
that is, maybe the solution is to actually merge the recordings in the MB database instead
2022-08-16 22800, 2022
Pratha-Fish
Is that verified by acoustic fingerprinting?
2022-08-16 22814, 2022
Pratha-Fish
(as if I know what it actually is in the first place)
2022-08-16 22820, 2022
alastairp
we could look at the fingerprints if both recording mbids have one submitted
2022-08-16 22846, 2022
Pratha-Fish
Right
2022-08-16 22804, 2022
alastairp
this is in fact one of the things that mayhem and I discussed that we could do with the data - take a look at these differences in data and give a list to the community, saying "here is something that perhaps could be improved in the MB database"
2022-08-16 22810, 2022
Pratha-Fish
Yea, I remember you guys discussing that one :)
2022-08-16 22806, 2022
Pratha-Fish
As for the mapping issue that we were talking about,
2022-08-16 22806, 2022
Pratha-Fish
There's 2 stages in the mapping script:
2022-08-16 22806, 2022
Pratha-Fish
1) Cleaning MLHD_rec_mbid into MLHD_canonical_mbid (wherever possible)
2022-08-16 22806, 2022
Pratha-Fish
2) Using this MLHD_canonical_mbid to lookup rec_name and artist_credit_name
2022-08-16 22806, 2022
Pratha-Fish
Now, the 1st stage only uses MB_rec_redirect and MB_rec_canonical tables to lookup original rec_mbids and fetch their respective canonical MBIDs
2022-08-16 22841, 2022
Pratha-Fish
so, the 1st stage doesn't use MBC at all. Could it be still possibly a sorting issue?
2022-08-16 22800, 2022
alastairp
do you have code for the first stage that you can link to me?
The 2 highlighted lines here just loop through the MLHD data, and replace any rec_mbids if they're found in the redirect table
2022-08-16 22827, 2022
alastairp
right, this is doing just the redirect table?
2022-08-16 22832, 2022
Pratha-Fish
It does the same thing again, but with the canonical rec mbid table
2022-08-16 22846, 2022
alastairp
ahh, I see
2022-08-16 22826, 2022
alastairp
so you do mbid -> mbid_gid_redirect and then do the canonical lookup, and then... use that mbid to get metadata, and look up mbc?
2022-08-16 22846, 2022
Pratha-Fish
exactly
2022-08-16 22805, 2022
alastairp
right. I think I see where some confusion may be
2022-08-16 22844, 2022
alastairp
our expectation is that canonical mbid table and mbc should probably give the same results in almost all cases, but to do this we could instead test:
2022-08-16 22851, 2022
alastairp
mbid -> mbid_gid_redirect -> canonical
2022-08-16 22853, 2022
alastairp
and separately
2022-08-16 22802, 2022
alastairp
mbid -> mbid_gid_redirect -> metadata -> mbc
2022-08-16 22826, 2022
Pratha-Fish
what does "metadata" here stand for?
2022-08-16 22836, 2022
alastairp
the artist credit name and recording title
2022-08-16 22832, 2022
Pratha-Fish
Looks like so far we have been using the mbid -> mbid_gid_redirect -> metadata -> mbc way
2022-08-16 22843, 2022
Pratha-Fish
and the metadata here is fetched using
2022-08-16 22848, 2022
alastairp
now, keep in mind the trick here that the canonical mbid mapping is actually based on the data from the mbc table, so I expect that the metadata will actually be the same regardless of if we look up redirect -> canonical -> metadata -> mbc or redirect -> metadata -> mbc
2022-08-16 22820, 2022
alastairp
so these 12k rows that you mentioned - are these from where the mbc lookup and the canonical lookup give different mbids?
do you mean that 88k rows keep the same mbid, and 12k of them change?
2022-08-16 22824, 2022
Pratha-Fish
Basically:
2022-08-16 22824, 2022
Pratha-Fish
I load 400k rows -> check in mbid_gid_redirect -> check in canonical_recording_redirect
2022-08-16 22857, 2022
Pratha-Fish
After this "cleanup" process, only 12k rows are left out of 400k
2022-08-16 22849, 2022
alastairp
I don't know what you mean my "rows are left"
2022-08-16 22839, 2022
Pratha-Fish
12k rows (12k mbids) are deemed canonical, and the rest are just dropped for unknown reasons
2022-08-16 22803, 2022
alastairp
oh, you're _missing_ 380k rows?
2022-08-16 22808, 2022
Pratha-Fish
exactly
2022-08-16 22813, 2022
Pratha-Fish
Let me explain with an example
2022-08-16 22814, 2022
alastairp
oops, that doesn't sound good
2022-08-16 22827, 2022
Pratha-Fish
Right
2022-08-16 22803, 2022
Pratha-Fish
I must be doing something incredibly stupid in the cleanup process. Because our earlier tests showed almost all recording MBIDs were good to go without cleanup