#metabrainz

/

      • kartikeyaSh
        I'll see on how to create index on a function
      • 2018-07-27 20807, 2018

      • alastairp
        what's the purpose of the function?
      • 2018-07-27 20825, 2018

      • kartikeyaSh
        convert a json array to a sorted postgres array
      • 2018-07-27 20837, 2018

      • alastairp
        but why do you want to do it?
      • 2018-07-27 20829, 2018

      • alastairp
        (for example, why can't we create an index directly on recording_json.data->'artist_mbids' ?
      • 2018-07-27 20849, 2018

      • iliekcomputers
        alastairp: it has been dumping lowlevel_json since forever
      • 2018-07-27 20802, 2018

      • alastairp
        iliekcomputers: yeah, weird. doesn't take this long when I do it
      • 2018-07-27 20815, 2018

      • kartikeyaSh
        it was decided that we need to store artist MBIDs in a way so that quick comparison can be made for multiple MBIDs
      • 2018-07-27 20822, 2018

      • iliekcomputers
        I did `pg_dump -Ft acousticbrainz -f acousticbrainz-dump`
      • 2018-07-27 20829, 2018

      • alastairp
        what's -F ?
      • 2018-07-27 20841, 2018

      • kartikeyaSh
        So, postgres array was the choice we went for
      • 2018-07-27 20859, 2018

      • iliekcomputers
        format = tar
      • 2018-07-27 20811, 2018

      • alastairp
        ah, so 1 file per table + tarred up?
      • 2018-07-27 20812, 2018

      • iliekcomputers
      • 2018-07-27 20819, 2018

      • alastairp
        text or binary format?
      • 2018-07-27 20825, 2018

      • kartikeyaSh
        So, that we don't have to do a string comparison like we do in case of json array
      • 2018-07-27 20829, 2018

      • iliekcomputers
        yes
      • 2018-07-27 20838, 2018

      • outsidecontext has quit
      • 2018-07-27 20801, 2018

      • alastairp
        kartikeyaSh: is that based on an assumption that a uuid:uuid comparison is faster than string:string?
      • 2018-07-27 20842, 2018

      • alastairp
        the thing here is that we are running a function on every single row anyway, so we lose any advantage that we had by doing a faster comparison
      • 2018-07-27 20833, 2018

      • alastairp
        so if you haven't tested it already, I'd see if it's possible to create an index directly on the list inside the json blob
      • 2018-07-27 20819, 2018

      • kartikeyaSh
        like index on data->>'artist_mbids' ?
      • 2018-07-27 20832, 2018

      • alastairp
        iliekcomputers: right, I see it's the binary format. I'd cancel it and try with plain format
      • 2018-07-27 20840, 2018

      • alastairp
        just to get an idea about speed
      • 2018-07-27 20843, 2018

      • alastairp
        kartikeyaSh: right
      • 2018-07-27 20850, 2018

      • kartikeyaSh
        yes it is
      • 2018-07-27 20856, 2018

      • iliekcomputers
        alastairp: right sure.
      • 2018-07-27 20802, 2018

      • demonimin has quit
      • 2018-07-27 20807, 2018

      • alastairp
        I know it's possible to create it on strings inside json, not sure about lists of strings though
      • 2018-07-27 20815, 2018

      • alastairp
        can you show me the definition of that function?
      • 2018-07-27 20848, 2018

      • kartikeyaSh
      • 2018-07-27 20800, 2018

      • iliekcomputers
        ok
      • 2018-07-27 20824, 2018

      • kartikeyaSh
        I think postgres interprates the whole list as a string and uses that to create index https://www.irccloud.com/pastebin/vwSq0ord/
      • 2018-07-27 20829, 2018

      • iliekcomputers
        running `pg_dump acousticbrainz -f acousticbrainz-plain-dump-20180727 -U acousticbrainz1 now
      • 2018-07-27 20846, 2018

      • iliekcomputers
        *-U acousticbrainzf
      • 2018-07-27 20849, 2018

      • iliekcomputers
        *-U acousticbrainz
      • 2018-07-27 20837, 2018

      • alastairp
        kartikeyaSh: see some of the jsonb examples here: https://stackoverflow.com/a/18405706
      • 2018-07-27 20849, 2018

      • alastairp
        `SELECT * FROM tracks WHERE artistnames ? 'The Dirty Heads';`
      • 2018-07-27 20807, 2018

      • alastairp
        looks like a parallel for what you're trying to do to check for a uuid in a list
      • 2018-07-27 20849, 2018

      • D4RK-PH0ENiX has quit
      • 2018-07-27 20817, 2018

      • demonimin joined the channel
      • 2018-07-27 20829, 2018

      • kartikeyaSh
        alastairp: actually, I'm always looking for entire list of MBIDs. I don't look at single artist MBID
      • 2018-07-27 20831, 2018

      • kartikeyaSh
        So, my intuition is that we can remove all this array conversion thing and simply store json lists in our tables
      • 2018-07-27 20836, 2018

      • kartikeyaSh
        and use indexes
      • 2018-07-27 20846, 2018

      • kartikeyaSh
        for quick access
      • 2018-07-27 20809, 2018

      • alastairp
        kartikeyaSh: that's only an example
      • 2018-07-27 20836, 2018

      • alastairp
        maybe instead you want ?& (Do all of these array strings exist as top-level keys?) - https://www.postgresql.org/docs/9.5/static/functi…
      • 2018-07-27 20804, 2018

      • alastairp
        it's just that in your original example you only had 1 UUID in the query, so I didn't realise that you wanted to check all of them
      • 2018-07-27 20839, 2018

      • kartikeyaSh
        I'll test it. But here still we have some conversion to an array which will again cause the same problem that we have at present
      • 2018-07-27 20823, 2018

      • Dr-Flay_ joined the channel
      • 2018-07-27 20857, 2018

      • Dr-Flay has quit
      • 2018-07-27 20841, 2018

      • D4RK-PH0ENiX joined the channel
      • 2018-07-27 20813, 2018

      • rsh7 has quit
      • 2018-07-27 20834, 2018

      • rsh7 joined the channel
      • 2018-07-27 20813, 2018

      • reosarevok
        yvanzo, bitmap: did we disable Gravatar yet? Someone is asking about how to get a user image, I suspect for now the best suggestion is "don't, the existing ones are going away soon anyway"? :p
      • 2018-07-27 20821, 2018

      • yvanzo
        reosarevok: not yet
      • 2018-07-27 20833, 2018

      • Darkloke has quit
      • 2018-07-27 20854, 2018

      • Dr-Flay_ has quit
      • 2018-07-27 20804, 2018

      • iliekcomputers
        alastairp: just an update, dump still running
      • 2018-07-27 20817, 2018

      • iliekcomputers
        file is 55GB now and it's dumping highlevel_model
      • 2018-07-27 20823, 2018

      • alastairp
        right
      • 2018-07-27 20826, 2018

      • alastairp
        mm, one sec
      • 2018-07-27 20840, 2018

      • iliekcomputers
        this is the new one I started today
      • 2018-07-27 20850, 2018

      • iliekcomputers
        around 3:30 hrs ago
      • 2018-07-27 20838, 2018

      • alastairp
        I do pg_dump | bzip2, and the files were ~180GB, but I can't remember how long they took to dump
      • 2018-07-27 20844, 2018

      • alastairp
        I'm not sure what the uncompressed size is, either
      • 2018-07-27 20801, 2018

      • alastairp
        keep an eye on the disk space too - only 400gb free
      • 2018-07-27 20807, 2018

      • iliekcomputers
        hmm, i should have done a bzip2 too
      • 2018-07-27 20831, 2018

      • alastairp
        given the time that even this dump has done, I'd be much happier if we can look at doing this partial dump that I was suggesting
      • 2018-07-27 20845, 2018

      • alastairp
        when do you have some time to sit down and come up with a plan/list of commands?
      • 2018-07-27 20800, 2018

      • alastairp
        Monday?
      • 2018-07-27 20830, 2018

      • iliekcomputers
        i'll do it today.
      • 2018-07-27 20855, 2018

      • alastairp
        I'm happy to help, but I don't think I'll have time today
      • 2018-07-27 20855, 2018

      • iliekcomputers
        a bash script, I guess.
      • 2018-07-27 20801, 2018

      • iliekcomputers
        alastairp: ah
      • 2018-07-27 20829, 2018

      • iliekcomputers
        I'll try to come up with something, if I get stuck, we can co-ordinate when you get the time.
      • 2018-07-27 20831, 2018

      • iliekcomputers
        ?
      • 2018-07-27 20859, 2018

      • alastairp
        absolutely
      • 2018-07-27 20828, 2018

      • iliekcomputers
        will you be around tomorrow? I know it's the weekend, so no rush.
      • 2018-07-27 20852, 2018

      • alastairp
        I might be around on-and-off, but I have visitors at home
      • 2018-07-27 20822, 2018

      • UmkaDK has quit
      • 2018-07-27 20846, 2018

      • iliekcomputers
        alastairp: okay, on and off would work for me :P
      • 2018-07-27 20859, 2018

      • UmkaDK joined the channel
      • 2018-07-27 20851, 2018

      • Dr-Flay joined the channel
      • 2018-07-27 20822, 2018

      • Dr-Flay has quit
      • 2018-07-27 20829, 2018

      • iliekcomputers
        dumping lowlevel now
      • 2018-07-27 20810, 2018

      • UmkaDK_ joined the channel
      • 2018-07-27 20855, 2018

      • UmkaDK has quit
      • 2018-07-27 20814, 2018

      • alastairp
        (btw, lowlevel is easy, it's lowlevel_json which is the big one :)
      • 2018-07-27 20829, 2018

      • alastairp
        but seriously, keep an eye on free disk space. I'm not sure you'll fit the whole thing uncompressed
      • 2018-07-27 20827, 2018

      • Dr-Flay joined the channel
      • 2018-07-27 20814, 2018

      • UmkaDK joined the channel
      • 2018-07-27 20810, 2018

      • UmkaDK_ has quit
      • 2018-07-27 20845, 2018

      • UmkaDK has quit
      • 2018-07-27 20824, 2018

      • Dr-Flay has quit
      • 2018-07-27 20859, 2018

      • UmkaDK joined the channel
      • 2018-07-27 20810, 2018

      • iliekcomputers
        alastairp: going to be afk for like two hours, should I just stop the dump? It might explode
      • 2018-07-27 20807, 2018

      • iliekcomputers
        Okay, hopefully it won't fill up 480GB in the next hour. I'm gonna let it run.
      • 2018-07-27 20833, 2018

      • iliekcomputers
        It's running on spike in screen on the param user btw
      • 2018-07-27 20858, 2018

      • Dr-Flay joined the channel
      • 2018-07-27 20801, 2018

      • alastairp
        OK, I'll keep an eye on it for the next hour, it'll probably be OK
      • 2018-07-27 20801, 2018

      • iliekcomputers
        Thanks
      • 2018-07-27 20819, 2018

      • Dr-Flay_ joined the channel
      • 2018-07-27 20853, 2018

      • Dr-Flay has quit
      • 2018-07-27 20801, 2018

      • Dr-Flay_ has quit
      • 2018-07-27 20840, 2018

      • Dr-Flay joined the channel
      • 2018-07-27 20822, 2018

      • Dr-Flay_ joined the channel
      • 2018-07-27 20824, 2018

      • Dr-Flay has quit
      • 2018-07-27 20850, 2018

      • antfoo has quit
      • 2018-07-27 20853, 2018

      • UmkaDK has quit
      • 2018-07-27 20820, 2018

      • UmkaDK joined the channel
      • 2018-07-27 20803, 2018

      • bukwurm joined the channel
      • 2018-07-27 20806, 2018

      • yokel has quit
      • 2018-07-27 20833, 2018

      • yokel joined the channel
      • 2018-07-27 20801, 2018

      • iliekcomputers
        alastairp: still dumping lowlevel_json, upto 259GB now
      • 2018-07-27 20810, 2018

      • antfoo joined the channel
      • 2018-07-27 20807, 2018

      • madmouser1 has quit
      • 2018-07-27 20834, 2018

      • madmouser1 joined the channel
      • 2018-07-27 20806, 2018

      • Lotheric_ is now known as Lotheric
      • 2018-07-27 20823, 2018

      • samj1912
      • 2018-07-27 20835, 2018

      • samj1912
        <100 ms RT even on full load
      • 2018-07-27 20843, 2018

      • zas
        yup :)
      • 2018-07-27 20839, 2018

      • Dr-Flay_ is now known as Dr-Flay
      • 2018-07-27 20843, 2018

      • madmouser1 has quit
      • 2018-07-27 20810, 2018

      • madmouser1 joined the channel
      • 2018-07-27 20858, 2018

      • iliekcomputers
        310 GB
      • 2018-07-27 20818, 2018

      • samj1912
      • 2018-07-27 20835, 2018

      • CatQuest
        hah
      • 2018-07-27 20859, 2018

      • kartikeyaSh
        ruaok: alastairp iliekcomputers artist_credit_clusters created. I created index over function that did the trick
      • 2018-07-27 20811, 2018

      • kartikeyaSh
      • 2018-07-27 20814, 2018

      • CatQuest
        that url(snail? what's it called again) was totally misleading
      • 2018-07-27 20823, 2018

      • samj1912
        slug :P
      • 2018-07-27 20835, 2018

      • CatQuest
        loi
      • 2018-07-27 20848, 2018

      • CatQuest
        slurl
      • 2018-07-27 20802, 2018

      • CatQuest
        but srsly, a slug of "chrome-http-not-secure" would have been much better
      • 2018-07-27 20824, 2018

      • CatQuest
        also ugh all these sites with +14 pt fontsizes
      • 2018-07-27 20822, 2018

      • CatQuest
        hmm
      • 2018-07-27 20824, 2018

      • CatQuest
        "Besides this, with the release of Google Chrome 69 in September this year, the company is also planning to remove the "Secure" label on HTTPS web pages, giving users the idea that the web is a safe place by default. "
      • 2018-07-27 20837, 2018

      • CatQuest
        i'm sorry, my brian is boling. cna someone explain this to me?
      • 2018-07-27 20842, 2018

      • CatQuest
        brain*
      • 2018-07-27 20850, 2018

      • samj1912
        CatQuest: basically a push to make https the norm
      • 2018-07-27 20829, 2018

      • samj1912
        bad rep if your site says not secure on top
      • 2018-07-27 20833, 2018

      • CatQuest
        yea but remove the "Secure" label which was users the idea that the web is a safe place by default
      • 2018-07-27 20849, 2018

      • CatQuest
        wouldn't that be counter productive? idk
      • 2018-07-27 20855, 2018

      • samj1912
        yeah basically a poorly worded article
      • 2018-07-27 20803, 2018

      • CatQuest
        oohhh
      • 2018-07-27 20849, 2018

      • samj1912
      • 2018-07-27 20854, 2018

      • samj1912
      • 2018-07-27 20850, 2018

      • dragonzeron joined the channel
      • 2018-07-27 20849, 2018

      • Lotheric has quit