#metabrainz

/

      • lucifer
        i had thought the rg one was faster.
      • mayhem
        and then take a look at the these two sub-queries on their own.
      • for the SLOW query, the top query executes super fast as it does for the OK query.
      • but when combined with the bottom query it takes EONS.
      • can you confirm that?
      • lucifer
        yeah i know this issue. have a couple of solutions let me see which one works
      • mayhem
        and OK to do it on wolf, because this is not optimizing something, this is basic functioning.
      • k
      • lucifer
        sure
      • ok do this
      • notice the materialized in the first line.
      • mayhem
        thanks lucifer -- master of all the PG tricks!
      • lucifer
        did it work?
      • fwiw, i did test it just now and the plan improved but this particular recording would be in cache now so better to test on some other recording
      • wargreen has quit
      • mayhem
        yep!
      • lucifer
        The issue here is the same as the one in https://github.com/metabrainz/listenbrainz-serv... . If your query exceeds 8 JOINs (including FROM iirc), PG will opt to use a probablistic optimizer instead of trying out all possible options. This happens to save planning time.
      • mayhem
        working much better, even with totally new recordings not in cahce.
      • and MATERIALIZED changes this how?
      • lucifer
        when you added release group JOIN, it crossed the limit. so one solution is to do what we do in mb metadata cache. increase the limit above which the optimizer degrades to a probablistic one.
      • the other one is force the CTE to evaluated separately from the rest of the query.
      • with materialized PG will see this as two separate queries, `1 CTE with 4-5 joins, then the main query with 4-5 JOINs
      • both on their own are below the threshold so are optimized well.
      • mayhem
        ah, got it. ok, will remember this. I hope. ;)
      • lucifer
        without materialized, the CTE is merged into the main query and it becomes 1 query with 8-9 joins that crosses the limit.
      • if your query has 8-9 joins, it'll probably hit this issue which may or may not be a perf issue. it depends on how you write the query. if your cte had begun with recording then joined to track then to medium to release it would probably have been faster and you might not have noticed the issue.
      • its the job of the optimizer that hits the limit to reorder the joins optimally automatically and the optimizer changes at the limit so the issue.
      • lucifer ends rant
      • mayhem
        :) thanks for the rant.
      • I wonder if subsequent version of PG improve on this.
      • lucifer
        maybe but materialized was probably the improvement they made so that users could control the cte behavior to their needs.
      • iirc, the story of the materialized bit is somewhat funny. it used to be the only possible thing till PG 10, always materialize. and served the purpose of optimization fence, you could use it to break the plan to your needs. so ctes were different from subqueries which would always be merged with the main query.
      • then it was changed in PG 11 to be non-materialized whenever possible so that broke people's workarounds who had been using CTEs for keeping queries separate. so in PG 12 they retained the default from 11 but they added the materialized/not materialized option to let users configure it on per cte basis.
      • lucifer ends it for real this time
      • mayhem
        lol. hard earned knowledge, I see. :)
      • lucifer
        yeah been bitten many times by this
      • we ran TS on PG 11 whereas MB on PG 12 till earlier this year so ...
      • mayhem
        that's just it -- its "bitten" thing. you'd expect that these sorts of thresholds to be ironed out over time.
      • lucifer
        yup makes sense
      • mayhem
        however, lets take a moment for some perspective.
      • this is purely amazing free software.
      • which functions many times better than any paid software I've ever used. esp if it was from M$
      • PG has bitten us very few times in the past 20 years or so. which is amazing.
      • lucifer
        indeed indeed, definitely agree.
      • and other databases paid or not, are not without their quirks. PG is awesome
      • mayhem nods deeply
      • mayhem
        we are lucky with the free toys we have.
      • lucifer
        yup
      • wargreen joined the channel
      • mayhem
      • lucifer: you might also be interested: https://github.com/metabrainz/auto-tag
      • zas
        interesting :)
      • mayhem
        I think it could speed up picard quite a bit AND reduce the traffic on the API. I think. there would be fewer calls, but more data.
      • there is a LOT of room for improvement, but I thought I would stop here so we can look and discuss
      • yvanzo
        To all: Jira apps will be updated in 5min. Tickets might be less available for a short time.
      • lucifer
        mayhem: i see, nice. will check it out.
      • mayhem: re the canonical mbid issue ^, the two running up the hill are both canonical :/
      • mayhem
        oh the joy of music metadata.
      • I guess that is a metadata issue in MB that needs cleaning up.
      • lucifer
        it seems there are there are 2 running up the hills, one making the deal with god and the other not.
      • mayhem
        is that better than a deal with lucifer?
      • lucifer
        imo nope.
      • mayhem
        I guess that is the topic of a book, not a song. so, dunno
      • right, thats cleared up then. ;)
      • lucifer
        this will also hit us when we add do not recommend.
      • mayhem
        quite so.
      • lucifer
        user marked one of these as do not recommend but still gets the other one as recommend.
      • mayhem
        reosarevok: do you think the above is something that should be corrected in MB?
      • understood. I think the thing to do is to identify these cases and have people help us with the data.
      • lucifer
        fwiw, there are many other recordings in MB which are slightly different version of the same recording.
      • mayhem
        indeed. some are just slightly different, enough to be considered a ":different' recording.
      • lucifer
        not sure an in issue in MB but maybe we need to expand the concept of canonical recording
      • yvanzo
        Done with Jira apps.
      • mayhem
        great minds.
      • not to expand, but *another* concept layered on top, I think.
      • atj
        recordings are a bit of a shit show on MB tbh
      • lucifer
        right and maybe enough for mapping purposes but not for this particular recommendations
      • hehe again
      • mayhem
        atj: like I said in the summit, my hope is to get more eyes on the data with LB
      • outsidecontext
        mayhem: cool, I'll check it out
      • atj
      • lucifer
      • these are the ones we treat as canonical in LB
      • atj
        it's not clear to me whether e87d0feb-abac-482d-b0b0-e5c65f2489ad and 6029d549-5858-4936-9156-b90770d2ae92 should be merged
      • BrainzGit
        [listenbrainz-android] 14akshaaatt merged pull request #1 (03main…bp-MOBILE-86): added new properties in Song class https://github.com/metabrainz/listenbrainz-andr...
      • atj
        it might be worth looking at ways of improving the functionality of the recordings tab so people are less likely to create duplicate recordings
      • reosarevok
        mayhem: it might very well be that these should be merged, but I don't know enough about Kate Bush and remixes/rerecordings to be sure
      • mayhem
        it is the same song, really. I think it should be merged.
      • lucifer
        the two recordings have different lengths in MB so probably different, no? 5:00 vs 5:45
      • mayhem
      • religious and label politics. not different versions. how that accounts for the 30 sec difference, no clue.
      • outsidecontext
      • but first I'll check whar's wrong with the pypi PGP key
      • reosarevok
        mayhem, lucifer: oh, one is specifically marked as "extended 12” single version" (I was on my phone earlier so I didn't check)
      • Then it's entirely correct that they are separate recordings
      • lucifer
        right, makes sense.
      • reosarevok
        Maybe that's the same as extended 12” single version? I wouldn't know myself
      • yvanzo: do you have the setup to quickly do https://tickets.metabrainz.org/browse/MBBE-65 still? :)
      • BrainzBot
        MBBE-65: Update 45cat.com/45worlds.com URLs to use HTTPS
      • BrainzGit
        [musicbrainz-server] 14reosarevok opened pull request #2748 (03master…MBS-12721): MBS-12721: Normalize 45cat.com/45worlds.com links to HTTPS https://github.com/metabrainz/musicbrainz-serve...
      • yvanzo
        reosarevok: no special setup is needed, it is all run from the prod cron container
      • reosarevok
        Oh, ok. Do you have it documented somewhere?
      • yvanzo
        Yes, see comments to previous similar tickets: https://tickets.metabrainz.org/issues/?jql=proj...
      • reosarevok
        Oh, perfect
      • Do you want to do it or should I? Also, have you found 500 per hour to be a good amount?
      • yvanzo
        It will have to be checked after MBS-12721 deployment anyway, so better wait for it.
      • BrainzBot
      • BrainzGit
        [musicbrainz-server] 14reosarevok opened pull request #2749 (03master…MBS-12722): MBS-12722: Add IdRef to the other databases whitelist https://github.com/metabrainz/musicbrainz-serve...
      • reosarevok
        Ok :)
      • aerozol
        Mōrena!
      • outsidecontext
        reosarevok: DNB answered :)
      • aerozol
        lucifer: still getting “Youtube player error. The request cannot be completed because you have exceeded your <a href="/youtube/v3/getting-started#quota">quota</a>.” today unfortunately, haven’t listened in a few days to see what happens
      • Shall I make a ticket?
      • reosarevok
        outsidecontext: neat! What did they say?
      • aerozol
        Also can we tweak anything so the error message doesn’t just show html code/the href? Or is that just because we’re displaying what YouTube is sending. Might not be worth worrying about if so
      • outsidecontext
        I think the bottom line is that we should not make assumptions about the IDs themselves, as they depend on the entity being linked to, but we can make assumptions about URL structure. I'll forward you the e-mail with some comment.
      • aerozol
      • reosarevok
        Thanks
      • lucifer
        aerozol: huh, i see in youtube api dashboard that quota is indeed being exceeded. weird
      • aerozol
        You said it’s quite high right? So either a exploit or an error (bad) or been getting used heaps (pretty good!)?
      • lucifer
        mayhem: hi! when you have time, can you open the google cloud console dashboard and see if you can edit quotas for youtube api? i don't see the extended quota limit anywhere maybe something we need to enable manually.
      • aerozol: yeah, i don't see the high limit being enforced. the old low limit is being used as of now
      • aerozol
        I can’t listen with Spotify if I don’t have premium right (I’m finally about to join the daily jams party)
      • lucifer
      • it seems we don't hit even the lower rate limit everyday. once a week or so.
      • aerozol: you can probably listen it on spotify.com not on BP though.
      • or using youtube with BP i guess.
      • aerozol
        Oh interesting, I’ve just been timing it badly?
      • Nick6 joined the channel
      • Nick6 has quit
      • Yeah I don’t wanna use spotify.com, I’m very pleased we have Youtube + BP support
      • lucifer
        yup that should work.
      • btw, do you any other music streaming service? we could add it to the list of services to add support for
      • aerozol
        Bandcamp if anything (I heard they don’t want to play with us?)
      • I’ll try again in a bit to see if Youtube will work
      • lucifer
        yeah mayhem can tell you about that.
      • aerozol
        :(
      • lucifer
        i think monkey had looked into adding bandcamp support to BP at some point but maybe misremembering.
      • aerozol
        Oh okay, I thought you meant that Bandcamp wasn’t keen. I think I remember something about that. Maybe monkey will know more
      • lucifer
        its both.
      • aerozol
        Aw man, I thought so. I always thought they were cool
      • Shall/can we do something about that error message?
      • lucifer
        as in fixing it or making it more informative for user?
      • BrainzGit
        [listenbrainz-server] 14amCap1712 opened pull request #2263 (03master…add-canonical-mbid-to-lookup): Add canonical mbid to recording mbid lookup https://github.com/metabrainz/listenbrainz-serv...
      • aerozol
        I meant it just showing html code, but we could definitely just have it say ‘rate limit reached’ instead!
      • lucifer
        ah ok, yes makes sense to do that,
      • aerozol
        Shall I make a ticket? No rush