load on pink is increasing, I think that's the same issue we had, related to sir-prod
2022-03-24 08333, 2022
bitmap
ok, I'll check
2022-03-24 08322, 2022
Shubh joined the channel
2022-03-24 08343, 2022
texke` joined the channel
2022-03-24 08315, 2022
texke has quit
2022-03-24 08318, 2022
lucifer
alastairp: write the explain query in a file, then run `psql -XqAt -f explain.sql > analyze.json` (flags are necessary to supress unneeded output) then copy file to local and paste to explain site. (won't work if you docker exec psql, either run from inside container and copy json outside conatiner or connect directly using psql from outisde). this is my usual workflow, learnt it from https://explain.dalibo.com/
2022-03-24 08357, 2022
lucifer
>sounds like the problem we had hoped to get around with this upgrade, but that didn't do the trick.
2022-03-24 08313, 2022
lucifer
mayhem: yeah that problem still exists in TS after the update but this is MB db which we didn't update. in any case, MB is on pg 12 TS was pg 11. the option we hoped to fix the issue was introduced in pg 12 so we can try adding materialized/not materialized to the start of the query and if that helps.
2022-03-24 08358, 2022
milosh has quit
2022-03-24 08349, 2022
asymmentric joined the channel
2022-03-24 08346, 2022
asymmentric has quit
2022-03-24 08313, 2022
lucifer
mayhem: alastairp: i did some investigation and this appears to be a case on hitting the join collapse limit. i updated some config for the current session and reran the query it becomes instant, https://explain.dalibo.com/plan/jMG#stats
i ran this before the query. the key is from/join collapse limit geqo on/off doesn't seem to affect the result but reading the docs, its probably best to disable geqo for this query directly or via threshold.
2022-03-24 08307, 2022
lucifer
the downside is increased planning times but a few extra secs in planning won't affect compared to suboptimal plans.
2022-03-24 08330, 2022
lucifer
fwiw, Planning time is ~60ms with these settings and exec time 6-10ms for the 3 items.
2022-03-24 08334, 2022
lucifer
mayhem: re the releases still missing, rg.type is NULLABLE. so we need to change it to LEFT JOIN as well. so we also need to decide ordering for rg.type NULLS LAST makes sense to me. thoughts?
2022-03-24 08347, 2022
lucifer
the other joins in the query artist_credit, medium and release are NOT NULL so hopefully this change should fix all releases. once all releases are in we can look into recordings.
you can relax with PEP after doing the jungle boogie
2022-03-24 08321, 2022
atj
sorry, wrong window for up+enter :)
2022-03-24 08345, 2022
mayhem
moin lucifer !
2022-03-24 08354, 2022
lucifer
morning!
2022-03-24 08329, 2022
mayhem
NULLS last is important -- it suggests that data is missing and therefore the results should be at the bottom. without it they come first and that is very wrong. let me add the rg type left
2022-03-24 08344, 2022
lucifer
yup makes sense, i checked the default for ASC order is NULLS LAST so no change needed (for DESC its NULLS FIRST)
2022-03-24 08315, 2022
Freso
lucifer: I’d still recommend new people to register with nickserv, in case we need to turn +r back on. But now we can nudge them to do so in here at least. :)
yvanzo: https://wiki.musicbrainz.org/How_to_Remove_Entiti… looks mostly ready to me now, but I'm sure I forgot some bits, so do check when you have some time. CatQuest, IIRC you wanted to take a look as well
2022-03-24 08324, 2022
alastairp
lucifer: hey, cool investigation, thanks.
2022-03-24 08323, 2022
lucifer
:)
2022-03-24 08337, 2022
lucifer
mayhem: oh well, i was wrong, medium needs to be a LEFT JOIN too.
2022-03-24 08351, 2022
lucifer
49adee94-0cc6-4719-9afa-ef0b9ec5d51c
2022-03-24 08309, 2022
lucifer
one release which does not have a medium.
2022-03-24 08345, 2022
lucifer
but no medium is no recordings so does it even matter to include these/
mayhem: huh,`00000373-114f-4fb6-a8dc-81159b727d06` is missing from the mapping currently but if I put it in the where clause of the query to build mapping, it does return a row. can you check if the mapping is being built succesfully?
2022-03-24 08343, 2022
BrainzGit
[musicbrainz-server] 14reosarevok opened pull request #2467 (03master…MBS-12279): MBS-12279 / MBS-12283 / MBS-12284: Drop more manual deletion options and fix related issues https://github.com/metabrainz/musicbrainz-server/…
2022-03-24 08338, 2022
reosarevok
lucifer: fwiw releases can both have no mediums *or* have mediums with no recordings
2022-03-24 08345, 2022
reosarevok
In case the second option matters, although it might not
2022-03-24 08357, 2022
reosarevok
(haven't checked the query)
2022-03-24 08336, 2022
lucifer
i don't think so it does. our use case is mapping recordings to their "most suitable" (canonical) release.
2022-03-24 08322, 2022
lucifer
(for recordings that appear on multiple releases)
2022-03-24 08342, 2022
reosarevok
Oh, ok. Well, then you probably do want to join, not left join medium
2022-03-24 08348, 2022
reosarevok
Since you want to ignore mediumless releases anyway
this check is problematic for recordings with the same name, no?
2022-03-24 08328, 2022
lucifer
oh nvm, i see there's an else to handle that
2022-03-24 08337, 2022
mayhem
this is the whole crux of the mapping. only the first ever pairing of artist_credit_name and recording_name should ever be accepted into the mapping. All subsequent pairs go into canonical recordings.
2022-03-24 08352, 2022
lucifer
right makes sense
2022-03-24 08317, 2022
mayhem
there is, however, a different question to be had: should this check be done after the unidecode?
2022-03-24 08328, 2022
mayhem
and on combined lookup?
2022-03-24 08316, 2022
mayhem
I think it might need to be.
2022-03-24 08309, 2022
lucifer
probably makes sense to do that. but i guess wouldn't matter much this seems to inside artist_credit_id check.
2022-03-24 08329, 2022
mayhem
yes.
2022-03-24 08345, 2022
mayhem
and how it is currently handled, any whitespace changes are different tracks.
2022-03-24 08301, 2022
lucifer
ah i see.
2022-03-24 08327, 2022
mayhem
alastairp: ping
2022-03-24 08328, 2022
lucifer
ok lets keep that in mind as an enhancement once the bugs are fixed.
2022-03-24 08314, 2022
mayhem
k.
2022-03-24 08326, 2022
mayhem
I'll investigate after I get the release_data issue sorted.
2022-03-24 08348, 2022
lucifer
i am unsure why that query is still running (assuming its the same from yesterday). took ~3 mins without updating limit and 3s after it but its been while since that one has been running.
2022-03-24 08323, 2022
lucifer
👍
2022-03-24 08351, 2022
Mineo has quit
2022-03-24 08303, 2022
lucifer
oh btw, I had opened a feature request to add MBID to the unified metadata api on Android ~1.5 years ago. It was closed as won't fix without any reason provided last week :(
2022-03-24 08301, 2022
mayhem
whatever, wank-offs.
2022-03-24 08350, 2022
lucifer
mayhem: fine to run few queries currently on bono to figure out missing recordings?
2022-03-24 08343, 2022
mayhem
sure.
2022-03-24 08317, 2022
alastairp
mayhem: pong
2022-03-24 08327, 2022
mayhem
are you running stuff on bono now?
2022-03-24 08331, 2022
alastairp
yes
2022-03-24 08346, 2022
mayhem
ok. three data engineers, one machine. poor bono.
2022-03-24 08324, 2022
alastairp
I can quit it and move to my home desktop
2022-03-24 08340, 2022
mayhem
is that isn't a hassle, that would help lucifer and I.
2022-03-24 08345, 2022
alastairp
yeah, no problem at all
2022-03-24 08350, 2022
mayhem
sweet, thanks.
2022-03-24 08359, 2022
alastairp
I quit the python process, not sure if pg will take time to respond to the cancelled query
2022-03-24 08318, 2022
mayhem
do you expect the query to finish soon?
2022-03-24 08326, 2022
mayhem
if not I can cancel the backend.
2022-03-24 08342, 2022
lucifer
👍
2022-03-24 08349, 2022
alastairp
go ahead and cancel it
2022-03-24 08355, 2022
mayhem
on it.
2022-03-24 08328, 2022
lucifer
another fun thing i learnt today, pg often prefers seq scan to index scan for anti join causing those to run slowwwwwwwwwww. fix is disable seqscan for the session.