<aerozol> "We don't allow full automation -..." <- I see. Thanks for clarifying!
2022-12-08 34210, 2022
schickling[m]
Are there some ideas/efforts (beyond seeding) to facilitate contributions in a more semi-automated way?
2022-12-08 34220, 2022
schickling[m]
<aerozol> "Here's an example of an external..." <- Very cool. Just gave it a try here. Worked pretty well. Only had to select the Label manually. (The strings matched, so maybe even that could be "fixed" so one less manual interaction - just a suggestion)
<aerozol> "Here's an example of an external..." <- is there a similar "seeding flow" but for updating/filling in missing fields for existing items?
2022-12-08 34235, 2022
aerozol
schickling[m]: hmm, not sure... I haven't used one. Maybe someone else knows one.
2022-12-08 34252, 2022
ShivangiPatel joined the channel
2022-12-08 34242, 2022
ShivangiPatel has quit
2022-12-08 34215, 2022
Rishabh joined the channel
2022-12-08 34239, 2022
Rishabh has quit
2022-12-08 34207, 2022
vibhoo_24 joined the channel
2022-12-08 34202, 2022
d4rkie joined the channel
2022-12-08 34235, 2022
d4rk has quit
2022-12-08 34203, 2022
vibhoo_24 has quit
2022-12-08 34200, 2022
alexrelis has quit
2022-12-08 34212, 2022
alexrelis joined the channel
2022-12-08 34213, 2022
alexrelis has quit
2022-12-08 34227, 2022
thuna` joined the channel
2022-12-08 34242, 2022
vibhoo_24 joined the channel
2022-12-08 34230, 2022
schickling[m]
quick question: in plain language - what are the differences between the track and recording table? I see that the track table has around 10M more entries.
2022-12-08 34252, 2022
schickling[m]
Based on this DB schema diagram I was assuming that there's a 1:1 relation between a track and a recording, so I'm a bit puzzled why there are so many more tracks than recordings?
2022-12-08 34256, 2022
schickling[m] uploaded an image: (629KiB) < https://libera.ems.host/_matrix/media/v3/download/matrix.org/zOMhRmJScfnNyMfbWkftntnG/image.png >
2022-12-08 34258, 2022
vibhoo_24 has quit
2022-12-08 34216, 2022
duncan
I'm not familiar with the database structure, but is it not because many releases share the same recordings?
2022-12-08 34211, 2022
schickling[m]
Ah I see. Yeah, that could make sense. It's basically like a n:m join table between a release/medium and recordings
2022-12-08 34232, 2022
kepstin
indeed, a track is on a specific release, while a recording can be shared between multiple releases (or be standalone, on no releases)
2022-12-08 34208, 2022
schickling[m]
Got it. Thanks :)
2022-12-08 34214, 2022
schickling[m] uploaded an image: (264KiB) < https://libera.ems.host/_matrix/media/v3/download/matrix.org/gMpfCkeLGRqDVvadgPMiQTyi/CleanShot%202022-12-08%20at%2017.39.28%402x.png >
2022-12-08 34226, 2022
schickling[m]
I assume those missing recordings were deleted over time?
2022-12-08 34236, 2022
schickling[m]
* I assume those "missing recordings" were deleted over time?
2022-12-08 34252, 2022
kepstin
could be some deleted ones, and also the way postgresql sequences work sometimes the sequence skips numbers to ensure there's no problems with parallel transactions.
2022-12-08 34257, 2022
schickling[m]
I see. Thanks. (Hope you don't mind those sort of questions, just trying to "absorb" the existing Musicbrainz database design wisdom š”)
2022-12-08 34204, 2022
kepstin
but yeah, probably mostly deleted (or merged) recordings
2022-12-08 34222, 2022
kepstin
the gid is the real reference, and you can look up merged gids through a separate table
2022-12-08 34206, 2022
kepstin
(row ids are used for internal joins in the database tho)
2022-12-08 34216, 2022
schickling[m]
is the gid synonymous with MBID in tables like track, recording etc?
2022-12-08 34221, 2022
kepstin
yeah
2022-12-08 34234, 2022
schickling[m]
gid = global id?
2022-12-08 34211, 2022
kepstin
they're sometimes also called uuids, too, tho i dunno if that term's ever used in the mbs codebase.
2022-12-08 34240, 2022
schickling[m]
different question: if there's a row in e.g. `recording_gid_redirect`, does this give me a guarantee that there isn't a `recording` anymore with that `gid`?
2022-12-08 34240, 2022
schickling[m]
another way to ask the same question: are merged `recording` entries deleted?
2022-12-08 34238, 2022
acohn has quit
2022-12-08 34244, 2022
acohn joined the channel
2022-12-08 34207, 2022
ttree joined the channel
2022-12-08 34209, 2022
kepstin
i believe that is true, but you should get confirmation from someone who's worked with the database more recently than me :)
2022-12-08 34249, 2022
kepstin
I know that for deleted stuff, the main record of what it was when it existed will actually be in the editing history, not in the db.
2022-12-08 34202, 2022
kepstin
well, editing history is also in the db, but you know what i mean :)
2022-12-08 34223, 2022
kepstin
you might also consider asking in #metabrainz:libera.chat for the more technical musicbrainz internals questions.
2022-12-08 34228, 2022
ttree has quit
2022-12-08 34255, 2022
ttree joined the channel
2022-12-08 34214, 2022
fhe has quit
2022-12-08 34228, 2022
reosarevok
schickling[m]: yes, a redirect gid is as valid as a non-redirect one, so it only applies to one recording
2022-12-08 34251, 2022
reosarevok
The data is combined into one entity when merging
2022-12-08 34244, 2022
anonn joined the channel
2022-12-08 34214, 2022
vibhoo_24 joined the channel
2022-12-08 34237, 2022
schickling[m]
is there a elasticsearch instance or similar for fuzzy search on musicbrainz?
2022-12-08 34217, 2022
kaliko has quit
2022-12-08 34258, 2022
kaliko joined the channel
2022-12-08 34205, 2022
reosarevok
Nope. IIRC we have pretty bad experiences with ES specifically too, anyway
2022-12-08 34229, 2022
reosarevok
From when we tried it for BookBrainz
2022-12-08 34254, 2022
reosarevok
SOLR supports some sorts of fuzzy-ish search, but dunno if what you need :)
2022-12-08 34211, 2022
vibhoo_24 has quit
2022-12-08 34240, 2022
schickling[m]
<reosarevok> "Nope. IIRC we have pretty bad..." <- Would love to hear more about the sort of problems you ran into actually š¤