#metabrainz

/

      • zas
        bitmap: around?
      • bitmap
        zas: hi
      • zas
        load on pink is increasing, I think that's the same issue we had, related to sir-prod
      • bitmap
        ok, I'll check
      • Shubh joined the channel
      • texke` joined the channel
      • texke has quit
      • lucifer
        alastairp: write the explain query in a file, then run `psql -XqAt -f explain.sql > analyze.json` (flags are necessary to supress unneeded output) then copy file to local and paste to explain site. (won't work if you docker exec psql, either run from inside container and copy json outside conatiner or connect directly using psql from outisde). this is my usual workflow, learnt it from https://explain.dalibo.com/
      • >sounds like the problem we had hoped to get around with this upgrade, but that didn't do the trick.
      • mayhem: yeah that problem still exists in TS after the update but this is MB db which we didn't update. in any case, MB is on pg 12 TS was pg 11. the option we hoped to fix the issue was introduced in pg 12 so we can try adding materialized/not materialized to the start of the query and if that helps.
      • milosh has quit
      • asymmentric joined the channel
      • asymmentric has quit
      • mayhem: alastairp: i did some investigation and this appears to be a case on hitting the join collapse limit. i updated some config for the current session and reran the query it becomes instant, https://explain.dalibo.com/plan/jMG#stats
      • (https://explain.dalibo.com/plan/oiY and https://explain.dalibo.com/plan/cs3 played around with cte first, bad plans took >2 min to execute but sharing in case relevant or find it interesting)
      • i ran this before the query. the key is from/join collapse limit geqo on/off doesn't seem to affect the result but reading the docs, its probably best to disable geqo for this query directly or via threshold.
      • the downside is increased planning times but a few extra secs in planning won't affect compared to suboptimal plans.
      • fwiw, Planning time is ~60ms with these settings and exec time 6-10ms for the 3 items.
      • mayhem: re the releases still missing, rg.type is NULLABLE. so we need to change it to LEFT JOIN as well. so we also need to decide ordering for rg.type NULLS LAST makes sense to me. thoughts?
      • the other joins in the query artist_credit, medium and release are NOT NULL so hopefully this change should fix all releases. once all releases are in we can look into recordings.
      • reosarevok
      • Will let you know when I think it's ready :)
      • yvanzo
        👍
      • atj
        you can relax with PEP after doing the jungle boogie
      • sorry, wrong window for up+enter :)
      • mayhem
        moin lucifer !
      • lucifer
        morning!
      • mayhem
        NULLS last is important -- it suggests that data is missing and therefore the results should be at the bottom. without it they come first and that is very wrong. let me add the rg type left
      • lucifer
        yup makes sense, i checked the default for ASC order is NULLS LAST so no change needed (for DESC its NULLS FIRST)
      • Freso
        lucifer: I’d still recommend new people to register with nickserv, in case we need to turn +r back on. But now we can nudge them to do so in here at least. :)
      • lucifer
        Freso: makes sense, thanks!
      • asymmentric joined the channel
      • asymmentric has quit
      • mayhem
      • getting a lot closer!
      • and once again, recording counts do not change
      • odd.
      • reosarevok
        aerozol, CatQuest, any other who might want a say: https://community.metabrainz.org/t/about-the-co... wouldn't mind some eyes on this
      • (my last post, not the entire thing)
      • yvanzo: https://wiki.musicbrainz.org/How_to_Remove_Enti... looks mostly ready to me now, but I'm sure I forgot some bits, so do check when you have some time. CatQuest, IIRC you wanted to take a look as well
      • alastairp
        lucifer: hey, cool investigation, thanks.
      • lucifer
        :)
      • mayhem: oh well, i was wrong, medium needs to be a LEFT JOIN too.
      • 49adee94-0cc6-4719-9afa-ef0b9ec5d51c
      • one release which does not have a medium.
      • but no medium is no recordings so does it even matter to include these/
      • BrainzGit
        [listenbrainz-server] 14akshaaatt opened pull request #1930 (03bootstrap5…fix-bootstrap): Fix Bootstrap removal https://github.com/metabrainz/listenbrainz-serv...
      • odnes joined the channel
      • antlarr joined the channel
      • Mineo_ has quit
      • Mineo joined the channel
      • lucifer
        mayhem: huh,`00000373-114f-4fb6-a8dc-81159b727d06` is missing from the mapping currently but if I put it in the where clause of the query to build mapping, it does return a row. can you check if the mapping is being built succesfully?
      • BrainzGit
        [musicbrainz-server] 14reosarevok opened pull request #2467 (03master…MBS-12279): MBS-12279 / MBS-12283 / MBS-12284: Drop more manual deletion options and fix related issues https://github.com/metabrainz/musicbrainz-serve...
      • reosarevok
        lucifer: fwiw releases can both have no mediums *or* have mediums with no recordings
      • In case the second option matters, although it might not
      • (haven't checked the query)
      • lucifer
        i don't think so it does. our use case is mapping recordings to their "most suitable" (canonical) release.
      • (for recordings that appear on multiple releases)
      • reosarevok
        Oh, ok. Well, then you probably do want to join, not left join medium
      • Since you want to ignore mediumless releases anyway
      • lucifer
        yup indeed
      • reosarevok
        They should never have recordings, as you said :)
      • BrainzGit
        [listenbrainz-server] 14akshaaatt merged pull request #1930 (03bootstrap5…fix-bootstrap): Fix Bootstrap cleanup https://github.com/metabrainz/listenbrainz-serv...
      • mayhem
        the last mapping built successully, yes, lucifer
      • Mineo has quit
      • lucifer
        huh weird
      • this returns a row but its missing from the mapping table.
      • so some python processing bug?
      • mayhem
        certainly is possible.
      • and I guess in this case the release table is correct, yes?
      • but the python processing bug is likely the explanation for other issues as well.
      • lucifer
        yes i think so
      • mayhem
        one thing we can do is add that artist to the test artist list.
      • lucifer
        makes sense
      • mayhem
        and then config.py set MINIMAL_DATASET to True.
      • that will run things much faster.
      • the first thing I want to dig into is the missing releases from the mb_metadata_cache.
      • after than I can do debugging on this.
      • lucifer
        those releases come from mbid_mapping_releases table no?
      • mayhem
        yes.
      • but the items in question are in that table, I think. I need to double check.
      • lucifer
        so mapping releases table has the release but not mb metadata cache?
      • mayhem
        hmm, ok. for the freddy mercury case, the row is in the table, but it goes to a release that has no coverart.
      • really, which one should be picked here? the last one is picked. kinda a toss up.
      • but the problem extends much further. there are 1.1M releases in the mb_metadata cache. time to find another test case.
      • lucifer
        really unsure about it. but fwiw, i think people using LB would care about coverart than what is semantically correct acc. to MB.
      • *more than
      • Mineo joined the channel
      • mayhem
        agreed.
      • but this test case is far far into the weeds.
      • I need to fix more serious bugs first.
      • lucifer
        makes sense
      • mayhem
        lucifer: are you running something on bono?
      • lucifer
        no
      • mayhem
        bono is acting a bit strang.e
      • might be time for a reboot.
      • up 267 days and massive amounts of abuse.
      • lucifer
        pg stat activity shows some queries.
      • mayhem
        I've got nothing running, do you ?
      • lucifer
        nope
      • maybe alastairp ?
      • 16385 | musicbrainz_db | 2086 | 10 | musicbrainz | | 192.168.48.1 | | 40796 | 2022-03-24 10:36:30.659699+00 | 2022-03-24 10:36:30.833556+00 | 2022-03-24 10:42:39.016324+00 | 2022-03-24 10:42:39.016334+00 | IO | BufFileRead | active | | 25516992 | WITH modified (recording_mbid) AS (values
      • ('e97f805a-ab48-4c52-855e-07049142113d'::uuid),('7ce4d633-0466-4b76-912b-30e83e2c10e3'::uuid),('97e69767-5d34-4c97-b36a-f3b2b1ef9dae'::uuid)), artist_rels AS (+| client backend
      • oops i thought it would ask for paste 😓
      • mayhem
        ah yes, that is alastairp
      • lucifer
      • this check is problematic for recordings with the same name, no?
      • oh nvm, i see there's an else to handle that
      • mayhem
        this is the whole crux of the mapping. only the first ever pairing of artist_credit_name and recording_name should ever be accepted into the mapping. All subsequent pairs go into canonical recordings.
      • lucifer
        right makes sense
      • mayhem
        there is, however, a different question to be had: should this check be done after the unidecode?
      • and on combined lookup?
      • I think it might need to be.
      • lucifer
        probably makes sense to do that. but i guess wouldn't matter much this seems to inside artist_credit_id check.
      • mayhem
        yes.
      • and how it is currently handled, any whitespace changes are different tracks.
      • lucifer
        ah i see.
      • mayhem
        alastairp: ping
      • lucifer
        ok lets keep that in mind as an enhancement once the bugs are fixed.
      • mayhem
        k.
      • I'll investigate after I get the release_data issue sorted.
      • lucifer
        i am unsure why that query is still running (assuming its the same from yesterday). took ~3 mins without updating limit and 3s after it but its been while since that one has been running.
      • 👍
      • Mineo has quit
      • oh btw, I had opened a feature request to add MBID to the unified metadata api on Android ~1.5 years ago. It was closed as won't fix without any reason provided last week :(
      • mayhem
        whatever, wank-offs.
      • lucifer
        mayhem: fine to run few queries currently on bono to figure out missing recordings?
      • mayhem
        sure.
      • alastairp
        mayhem: pong
      • mayhem
        are you running stuff on bono now?
      • alastairp
        yes
      • mayhem
        ok. three data engineers, one machine. poor bono.
      • alastairp
        I can quit it and move to my home desktop
      • mayhem
        is that isn't a hassle, that would help lucifer and I.
      • alastairp
        yeah, no problem at all
      • mayhem
        sweet, thanks.
      • alastairp
        I quit the python process, not sure if pg will take time to respond to the cancelled query
      • mayhem
        do you expect the query to finish soon?
      • if not I can cancel the backend.
      • lucifer
        👍
      • alastairp
        go ahead and cancel it
      • mayhem
        on it.
      • lucifer
        another fun thing i learnt today, pg often prefers seq scan to index scan for anti join causing those to run slowwwwwwwwwww. fix is disable seqscan for the session.
      • mayhem
        anti-join?
      • meh, that query is not cancelling.
      • lucifer