load on pink is increasing, I think that's the same issue we had, related to sir-prod
bitmap
ok, I'll check
Shubh joined the channel
texke` joined the channel
texke has quit
lucifer
alastairp: write the explain query in a file, then run `psql -XqAt -f explain.sql > analyze.json` (flags are necessary to supress unneeded output) then copy file to local and paste to explain site. (won't work if you docker exec psql, either run from inside container and copy json outside conatiner or connect directly using psql from outisde). this is my usual workflow, learnt it from https://explain.dalibo.com/
>sounds like the problem we had hoped to get around with this upgrade, but that didn't do the trick.
mayhem: yeah that problem still exists in TS after the update but this is MB db which we didn't update. in any case, MB is on pg 12 TS was pg 11. the option we hoped to fix the issue was introduced in pg 12 so we can try adding materialized/not materialized to the start of the query and if that helps.
milosh has quit
asymmentric joined the channel
asymmentric has quit
mayhem: alastairp: i did some investigation and this appears to be a case on hitting the join collapse limit. i updated some config for the current session and reran the query it becomes instant, https://explain.dalibo.com/plan/jMG#stats
i ran this before the query. the key is from/join collapse limit geqo on/off doesn't seem to affect the result but reading the docs, its probably best to disable geqo for this query directly or via threshold.
the downside is increased planning times but a few extra secs in planning won't affect compared to suboptimal plans.
fwiw, Planning time is ~60ms with these settings and exec time 6-10ms for the 3 items.
mayhem: re the releases still missing, rg.type is NULLABLE. so we need to change it to LEFT JOIN as well. so we also need to decide ordering for rg.type NULLS LAST makes sense to me. thoughts?
the other joins in the query artist_credit, medium and release are NOT NULL so hopefully this change should fix all releases. once all releases are in we can look into recordings.
you can relax with PEP after doing the jungle boogie
sorry, wrong window for up+enter :)
mayhem
moin lucifer !
lucifer
morning!
mayhem
NULLS last is important -- it suggests that data is missing and therefore the results should be at the bottom. without it they come first and that is very wrong. let me add the rg type left
lucifer
yup makes sense, i checked the default for ASC order is NULLS LAST so no change needed (for DESC its NULLS FIRST)
Freso
lucifer: I’d still recommend new people to register with nickserv, in case we need to turn +r back on. But now we can nudge them to do so in here at least. :)
yvanzo: https://wiki.musicbrainz.org/How_to_Remove_Enti... looks mostly ready to me now, but I'm sure I forgot some bits, so do check when you have some time. CatQuest, IIRC you wanted to take a look as well
alastairp
lucifer: hey, cool investigation, thanks.
lucifer
:)
mayhem: oh well, i was wrong, medium needs to be a LEFT JOIN too.
49adee94-0cc6-4719-9afa-ef0b9ec5d51c
one release which does not have a medium.
but no medium is no recordings so does it even matter to include these/
mayhem: huh,`00000373-114f-4fb6-a8dc-81159b727d06` is missing from the mapping currently but if I put it in the where clause of the query to build mapping, it does return a row. can you check if the mapping is being built succesfully?
BrainzGit
[musicbrainz-server] 14reosarevok opened pull request #2467 (03master…MBS-12279): MBS-12279 / MBS-12283 / MBS-12284: Drop more manual deletion options and fix related issues https://github.com/metabrainz/musicbrainz-serve...
reosarevok
lucifer: fwiw releases can both have no mediums *or* have mediums with no recordings
In case the second option matters, although it might not
(haven't checked the query)
lucifer
i don't think so it does. our use case is mapping recordings to their "most suitable" (canonical) release.
(for recordings that appear on multiple releases)
reosarevok
Oh, ok. Well, then you probably do want to join, not left join medium
Since you want to ignore mediumless releases anyway
this check is problematic for recordings with the same name, no?
oh nvm, i see there's an else to handle that
mayhem
this is the whole crux of the mapping. only the first ever pairing of artist_credit_name and recording_name should ever be accepted into the mapping. All subsequent pairs go into canonical recordings.
lucifer
right makes sense
mayhem
there is, however, a different question to be had: should this check be done after the unidecode?
and on combined lookup?
I think it might need to be.
lucifer
probably makes sense to do that. but i guess wouldn't matter much this seems to inside artist_credit_id check.
mayhem
yes.
and how it is currently handled, any whitespace changes are different tracks.
lucifer
ah i see.
mayhem
alastairp: ping
lucifer
ok lets keep that in mind as an enhancement once the bugs are fixed.
mayhem
k.
I'll investigate after I get the release_data issue sorted.
lucifer
i am unsure why that query is still running (assuming its the same from yesterday). took ~3 mins without updating limit and 3s after it but its been while since that one has been running.
👍
Mineo has quit
oh btw, I had opened a feature request to add MBID to the unified metadata api on Android ~1.5 years ago. It was closed as won't fix without any reason provided last week :(
mayhem
whatever, wank-offs.
lucifer
mayhem: fine to run few queries currently on bono to figure out missing recordings?
mayhem
sure.
alastairp
mayhem: pong
mayhem
are you running stuff on bono now?
alastairp
yes
mayhem
ok. three data engineers, one machine. poor bono.
alastairp
I can quit it and move to my home desktop
mayhem
is that isn't a hassle, that would help lucifer and I.
alastairp
yeah, no problem at all
mayhem
sweet, thanks.
alastairp
I quit the python process, not sure if pg will take time to respond to the cancelled query
mayhem
do you expect the query to finish soon?
if not I can cancel the backend.
lucifer
👍
alastairp
go ahead and cancel it
mayhem
on it.
lucifer
another fun thing i learnt today, pg often prefers seq scan to index scan for anti join causing those to run slowwwwwwwwwww. fix is disable seqscan for the session.