#metabrainz

/

      • ruaok
        oh, I see. you're literally importing a subset of the MB data, but keeping exactly the same table structure of the chosen tables?
      • 2021-06-10 16135, 2021

      • lucifer
        right, a subset of MB tables and a subset of the rows in those chosen tables.
      • 2021-06-10 16100, 2021

      • lucifer
        the current results in the docs only cover the first part.
      • 2021-06-10 16130, 2021

      • ruaok
        subset of rows, even. how are the rows decided upon?
      • 2021-06-10 16140, 2021

      • ruaok
        depends on what is in AB, I guess?
      • 2021-06-10 16142, 2021

      • lucifer
        once we are satisfied with testing with entire data, we'll delete rows for which recordings are not there in AB and then test gain.
      • 2021-06-10 16155, 2021

      • lucifer
        right, compare with the recording MBIDs present in AB.
      • 2021-06-10 16154, 2021

      • lucifer
        I am getting execution plans to see if those reveal something.
      • 2021-06-10 16133, 2021

      • lucifer
      • 2021-06-10 16158, 2021

      • lucifer
      • 2021-06-10 16121, 2021

      • lucifer
        the only difference I see is that JOIN does an INDEX SCAN while EXISTS does INDEX ONLY SCAN. and this is causing 2X speedup in exists.
      • 2021-06-10 16154, 2021

      • lucifer
        but this is the first query when PG has zero cache. subsequent queries are blazing fast in either case.
      • 2021-06-10 16146, 2021

      • alastairp
        lucifer: hmm
      • 2021-06-10 16151, 2021

      • alastairp
        sorry, I'm working on some other projects
      • 2021-06-10 16149, 2021

      • ruaok
        what is the question you two are trying to answer?
      • 2021-06-10 16152, 2021

      • alastairp
        I believe that postgres will often do stuff like look for the "big" table and the "small" one, and do an index scan on the big one and seq scan in the other one
      • 2021-06-10 16150, 2021

      • lucifer
        Curresnt summary is directly accesing MB is twice faster than using imported data in the worst case
      • 2021-06-10 16138, 2021

      • lucifer
        ruaok: trying to find an explanation for this.
      • 2021-06-10 16150, 2021

      • ruaok
        ah
      • 2021-06-10 16110, 2021

      • alastairp
      • 2021-06-10 16115, 2021

      • alastairp
        "we wanted to check three different ways of performing these tasks"
      • 2021-06-10 16122, 2021

      • lucifer
        now the thing is this difference goes away if we use exists instead of a join to fetch imported data
      • 2021-06-10 16153, 2021

      • alastairp
        the final aim is to answer the question of "what's the fastest way to get metadata from MB for submissions in AB"
      • 2021-06-10 16105, 2021

      • lucifer
        so i dont think fetching imported data or accesing db has any difference in the worst case. its the join thats creating the difference
      • 2021-06-10 16134, 2021

      • alastairp
        lucifer: note that this is an interesting join. you're joining "all of AB" with "all of MB", then limiting it to 10k
      • 2021-06-10 16146, 2021

      • alastairp
        what if you did "10k of AB" join with "MB.recording"
      • 2021-06-10 16143, 2021

      • lucifer
        right makes sense.
      • 2021-06-10 16110, 2021

      • lucifer
        i think thats what exactly happening here. postgres does not know because of no cache.
      • 2021-06-10 16139, 2021

      • lucifer
        once one query has executed, it is able to optimize this on its own.
      • 2021-06-10 16145, 2021

      • lucifer
        alastairp: i am more interested in measuring the average case, any ideas how we can do that.
      • 2021-06-10 16110, 2021

      • alastairp
        yeah, you could make a sample dataset. randomly select 10,000 mbids, and then format them in a file as described here: https://acousticbrainz.org/datasets/import
      • 2021-06-10 16135, 2021

      • alastairp
        maybe split it into 8-10 categories of 1000-1200 mbids each
      • 2021-06-10 16138, 2021

      • alastairp
        import the dataset
      • 2021-06-10 16153, 2021

      • alastairp
        then use the data from here as the candidate list of MBIDs to join against
      • 2021-06-10 16130, 2021

      • lucifer
        ah ok, so join the dataset with entire MB data?
      • 2021-06-10 16139, 2021

      • alastairp
        yep
      • 2021-06-10 16107, 2021

      • lucifer
        i already have your Last.fm dataset imported so that part is done :). i'll just use that.
      • 2021-06-10 16113, 2021

      • alastairp
        perfect
      • 2021-06-10 16153, 2021

      • Sophist_UK joined the channel
      • 2021-06-10 16153, 2021

      • Sophist_UK has quit
      • 2021-06-10 16153, 2021

      • Sophist_UK joined the channel
      • 2021-06-10 16145, 2021

      • Sophist-UK has quit
      • 2021-06-10 16156, 2021

      • Sophist-UK joined the channel
      • 2021-06-10 16156, 2021

      • Sophist-UK has quit
      • 2021-06-10 16156, 2021

      • Sophist-UK joined the channel
      • 2021-06-10 16136, 2021

      • Sophist_UK has quit
      • 2021-06-10 16130, 2021

      • Sophist_UK joined the channel
      • 2021-06-10 16130, 2021

      • Sophist_UK has quit
      • 2021-06-10 16130, 2021

      • Sophist_UK joined the channel
      • 2021-06-10 16103, 2021

      • Sophist-UK has quit
      • 2021-06-10 16111, 2021

      • monkey
        alastairp : I'm off Wednesday next week; wanna do office day on Tuesday instead?
      • 2021-06-10 16158, 2021

      • alastairp
        actually, I'm also off on wednesday, got the trompa presentation. tuesday might not be a good idea because I'm sure everyone is going to be panicking the last day before... how is thursday for you?
      • 2021-06-10 16147, 2021

      • monkey
        Works for me
      • 2021-06-10 16155, 2021

      • Sophist-UK joined the channel
      • 2021-06-10 16155, 2021

      • Sophist-UK has quit
      • 2021-06-10 16155, 2021

      • Sophist-UK joined the channel
      • 2021-06-10 16155, 2021

      • Sophist_UK has quit
      • 2021-06-10 16122, 2021

      • Sophist_UK joined the channel
      • 2021-06-10 16122, 2021

      • Sophist_UK has quit
      • 2021-06-10 16122, 2021

      • Sophist_UK joined the channel
      • 2021-06-10 16100, 2021

      • Sophist-UK has quit
      • 2021-06-10 16124, 2021

      • BrainzGit
        [musicbrainz-server] 14reosarevok opened pull request #2142 (03beta…direction-backwards-hotfix): Fix getDirection -> isBackward to match other direction changes https://github.com/metabrainz/musicbrainz-server/…
      • 2021-06-10 16130, 2021

      • reosarevok
        bitmap: ^
      • 2021-06-10 16145, 2021

      • bitmap
        thanks!
      • 2021-06-10 16152, 2021

      • BrainzGit
        [musicbrainz-server] 14reosarevok merged pull request #2142 (03beta…direction-backwards-hotfix): Fix getDirection -> isBackward to match other direction changes https://github.com/metabrainz/musicbrainz-server/…
      • 2021-06-10 16154, 2021

      • CatQuest has quit
      • 2021-06-10 16109, 2021

      • reosarevok
        Fix released
      • 2021-06-10 16123, 2021

      • CatQuest joined the channel
      • 2021-06-10 16123, 2021

      • CatQuest has quit
      • 2021-06-10 16123, 2021

      • CatQuest joined the channel
      • 2021-06-10 16117, 2021

      • strider joined the channel
      • 2021-06-10 16133, 2021

      • strider has quit
      • 2021-06-10 16135, 2021

      • strider joined the channel
      • 2021-06-10 16107, 2021

      • akashgp09_ has quit
      • 2021-06-10 16146, 2021

      • SothoTalKer_ has quit
      • 2021-06-10 16117, 2021

      • SothoTalKer joined the channel
      • 2021-06-10 16144, 2021

      • SothoTalKer has quit
      • 2021-06-10 16127, 2021

      • SothoTalKer joined the channel
      • 2021-06-10 16154, 2021

      • SothoTalKer has quit
      • 2021-06-10 16112, 2021

      • SothoTalKer joined the channel