#metabrainz

/

      • Gentlecat
        that makes sense to me
      • 2016-03-16 07608, 2016

      • CallerNo6
        cool. I was thinking of doing it to make LordSputnik happy anyway.
      • 2016-03-16 07644, 2016

      • JesseW has quit
      • 2016-03-16 07609, 2016

      • regagain joined the channel
      • 2016-03-16 07616, 2016

      • armalcolite has quit
      • 2016-03-16 07618, 2016

      • CallerNo6
        okey dokey, first version of cleaned up gsoc page @ https://wiki.musicbrainz.org/Development/Summer_o…
      • 2016-03-16 07600, 2016

      • CallerNo6
        (sub-pages are a mishmash of redirects, transclusion and copy/paste, will try to standardize them later)
      • 2016-03-16 07614, 2016

      • Gentlecat
        looks good!
      • 2016-03-16 07603, 2016

      • outsidecontext has quit
      • 2016-03-16 07621, 2016

      • outsidecontext joined the channel
      • 2016-03-16 07643, 2016

      • manu-chroma has quit
      • 2016-03-16 07647, 2016

      • typhoe has quit
      • 2016-03-16 07645, 2016

      • typhoe joined the channel
      • 2016-03-16 07604, 2016

      • armalcolite joined the channel
      • 2016-03-16 07619, 2016

      • armalcolite
        any suggestions on where to begin again? currently just aiming for improving the proposal...
      • 2016-03-16 07604, 2016

      • armalcolite
        alastairp: should i just keep the project scope to api, as u suggested on draft?
      • 2016-03-16 07630, 2016

      • UmkaDK has quit
      • 2016-03-16 07643, 2016

      • reosarevok
        Sigh. Having to tell people "sorry but you're SOOL" sucks
      • 2016-03-16 07642, 2016

      • Gentlecat
        bitmap: did you mean "release_group_primary_type" instead of "release_group_type" in http://tickets.musicbrainz.org/browse/MBS-8838
      • 2016-03-16 07613, 2016

      • UmkaDK joined the channel
      • 2016-03-16 07626, 2016

      • ruaok
        reosarevok: I wasn't suggesting to shortcut the normal style process.
      • 2016-03-16 07634, 2016

      • ruaok
        that is why I said "start the process". :p
      • 2016-03-16 07637, 2016

      • reosarevok
        I know, but just making it clear for everyone :)
      • 2016-03-16 07643, 2016

      • ruaok
        k
      • 2016-03-16 07623, 2016

      • Gentlecat
        armalcolite: did you come up with a schedule?
      • 2016-03-16 07643, 2016

      • Gentlecat
        only you can decide what you want to do
      • 2016-03-16 07608, 2016

      • armalcolite
        yeah. will post it by tomorrow.
      • 2016-03-16 07618, 2016

      • armalcolite
        doing some edits.
      • 2016-03-16 07620, 2016

      • Mineo joined the channel
      • 2016-03-16 07644, 2016

      • diana_olhovyk joined the channel
      • 2016-03-16 07649, 2016

      • alastairp
        armalcolite: I don't think that's exactly what I suggested
      • 2016-03-16 07606, 2016

      • alastairp
        I think for the API, the scope should just be the submission part
      • 2016-03-16 07629, 2016

      • armalcolite
        i get it now.
      • 2016-03-16 07634, 2016

      • alastairp
        you also wanted a part for the charts too?
      • 2016-03-16 07642, 2016

      • armalcolite
        yeah.
      • 2016-03-16 07649, 2016

      • alastairp
        send your submission with both parts
      • 2016-03-16 07606, 2016

      • alastairp
        we can refine the submission over the next weeks
      • 2016-03-16 07623, 2016

      • armalcolite
        i am not sure how much charts will take, so i am currently planning to add it to optional.
      • 2016-03-16 07634, 2016

      • alastairp
        yes, I don't know how much they will take either
      • 2016-03-16 07651, 2016

      • alastairp
        I'm thinking about it too. I'll discuss with ruaok in person later this week
      • 2016-03-16 07614, 2016

      • armalcolite
        sure. i will post a timeline of the same probably by tomorrow.
      • 2016-03-16 07630, 2016

      • armalcolite
        and a updated draft. :)
      • 2016-03-16 07632, 2016

      • ruaok
        alastairp: the more I think about it, the more I like the idea of using PG for LB for the short-term.
      • 2016-03-16 07655, 2016

      • ruaok
        that makes it easier for anyone to participate in the short term. we learn what usage patterns we have.
      • 2016-03-16 07613, 2016

      • ruaok
        if we need to go back, we can go back with a clear set of ideas of what is needed.
      • 2016-03-16 07609, 2016

      • alastairp
        hmm. interesting
      • 2016-03-16 07609, 2016

      • alastairp
        OK
      • 2016-03-16 07630, 2016

      • alastairp
        let's talk about it then
      • 2016-03-16 07653, 2016

      • alastairp
        my guess is that we'd switch to it, and then never move off it because it's "easy and everyone knows how to use it"
      • 2016-03-16 07602, 2016

      • alastairp
        and then run into scalability problems :)
      • 2016-03-16 07621, 2016

      • Gentlecat
        why do we need to switch? because we can't do things that we need? or because we don't know how?
      • 2016-03-16 07630, 2016

      • alastairp
        a bit of both
      • 2016-03-16 07633, 2016

      • ruaok
        Gentlecat: #2
      • 2016-03-16 07656, 2016

      • ruaok
        cassandra has so many "this is ok, unless you push too hard over then. then you're fucked"
      • 2016-03-16 07601, 2016

      • ruaok
        we understand PG.
      • 2016-03-16 07619, 2016

      • alastairp
        #1 e.g. we can't do track/artist stats because cassandra is keyed on user
      • 2016-03-16 07630, 2016

      • ruaok
        alastairp: good fear to have, but we just need to reinforce a MO of keeping an impending move in mind.
      • 2016-03-16 07657, 2016

      • ruaok
        oh, of options, not questions, yes #1.
      • 2016-03-16 07628, 2016

      • ruaok
        I personally feel that cassandra is a big (at least mental) bottleneck.
      • 2016-03-16 07639, 2016

      • ruaok
        I don't know how to do this. soI don't do it.
      • 2016-03-16 07646, 2016

      • ruaok
        this, that, that over there.
      • 2016-03-16 07659, 2016

      • ruaok
        all comes with, sit down and spend a few days learning cassandra
      • 2016-03-16 07605, 2016

      • ruaok
        s/with/to
      • 2016-03-16 07618, 2016

      • Gentlecat
        but there's probably a reason it was chosen, no?
      • 2016-03-16 07620, 2016

      • ruaok leaves the keyboard and solders shit.
      • 2016-03-16 07633, 2016

      • ruaok
        oh, wait alastairp: hit me with the link to the book stand again.
      • 2016-03-16 07638, 2016

      • ruaok
        the printer is idle.
      • 2016-03-16 07621, 2016

      • ruaok
      • 2016-03-16 07656, 2016

      • bitmap
        Gentlecat: whoops, yes
      • 2016-03-16 07607, 2016

      • ruaok
        wow. 4 hours.
      • 2016-03-16 07616, 2016

      • ruaok
        ok, I'll wait until bedtime for that one
      • 2016-03-16 07636, 2016

      • kepstin is kind of sad that the example ids in http://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_music_service_c.html aren't real mbids ;)
      • 2016-03-16 07642, 2016

      • diana_olhovyk has quit
      • 2016-03-16 07606, 2016

      • alastairp
        ruaok: yes, that’s it
      • 2016-03-16 07600, 2016

      • justharshal joined the channel
      • 2016-03-16 07627, 2016

      • ruaok
        perfect. I crammed 6 hours worth of stuff onto the platform.
      • 2016-03-16 07632, 2016

      • ruaok
        will hit print before bed.
      • 2016-03-16 07659, 2016

      • alastairp
        cool
      • 2016-03-16 07611, 2016

      • alastairp
        it’ll do more than 1 at a time if it fits?
      • 2016-03-16 07623, 2016

      • ruaok
        yeah.
      • 2016-03-16 07646, 2016

      • ruaok
        the print head for the next piece needs to be clear of it, but it uses a painters algorithm to figure it out.
      • 2016-03-16 07603, 2016

      • Jormangeud has quit
      • 2016-03-16 07648, 2016

      • Jormangeud joined the channel
      • 2016-03-16 07640, 2016

      • alastairp
        hmm
      • 2016-03-16 07601, 2016

      • Jormangeud has quit
      • 2016-03-16 07656, 2016

      • Jormangeud joined the channel
      • 2016-03-16 07633, 2016

      • alastairp
        joining onto our highlevel model table is slooow
      • 2016-03-16 07652, 2016

      • alastairp
        100% cpu :(
      • 2016-03-16 07615, 2016

      • kanha has quit
      • 2016-03-16 07655, 2016

      • kanha joined the channel
      • 2016-03-16 07601, 2016

      • CallerNo6
        legoktm, when I edit an interwiki entry, does some cache have to be refreshed or something?
      • 2016-03-16 07606, 2016

      • CallerNo6 keeps making dumb typos
      • 2016-03-16 07649, 2016

      • legoktm
        was it already used in a page?
      • 2016-03-16 07659, 2016

      • CallerNo6
        yeah
      • 2016-03-16 07608, 2016

      • legoktm
        you'll have to make a null edit (open edit window and hit save without changing anything)
      • 2016-03-16 07625, 2016

      • CallerNo6
        ah. cool, thanks!
      • 2016-03-16 07630, 2016

      • legoktm
        if it's a multiple pages I can clear the cache server-side
      • 2016-03-16 07628, 2016

      • CallerNo6
        Should just be the one. I'll try to bug you as little as possible :-)
      • 2016-03-16 07649, 2016

      • alastairp
        so, postgres guys
      • 2016-03-16 07607, 2016

      • alastairp
        joining onto a 50m row table. good or bad idea?
      • 2016-03-16 07613, 2016

      • kepstin
        depends on how your indexes are set up, mostly, i'd think...
      • 2016-03-16 07624, 2016

      • alastairp
        yeah
      • 2016-03-16 07644, 2016

      • alastairp
        we have a main table, which has a pk and mbid (3m rows)
      • 2016-03-16 07624, 2016

      • kepstin
        should probably look at the query plan that was generated, see if there's anything obviously odd in it.
      • 2016-03-16 07625, 2016

      • alastairp
        and a data table which has 42m rows, 14 for each in the main table, with a fk, id 1-14, and jsonb data
      • 2016-03-16 07647, 2016

      • alastairp
      • 2016-03-16 07646, 2016

      • alastairp
        I’m wanting to get a csv dump of all the data. dumping the data table with a where, and referencing into the jsonb is as quick as I expected
      • 2016-03-16 07607, 2016

      • kepstin
        what's the query you're running for that?
      • 2016-03-16 07608, 2016

      • alastairp
        about 1000 rows per second
      • 2016-03-16 07627, 2016

      • alastairp
        select hl.mbid, hlm.data->>'value' as genre from highlevel_model hlm join highlevel hl on hlm.highlevel=hl.id where model=3
      • 2016-03-16 07655, 2016

      • alastairp
        as soon as I put the join in, it takes about 30 seconds per 100 rows
      • 2016-03-16 07631, 2016

      • kepstin
        hmm. you're extracting this information for every entry, rather than only a subset of them?
      • 2016-03-16 07601, 2016

      • alastairp
        hlm.model=3
      • 2016-03-16 07622, 2016

      • alastairp
        it’ll be 1 row for every item in highlevel, but only 1/14 for highlevel_model
      • 2016-03-16 07639, 2016

      • alastairp
        oh, I reversed the from/join in this one to see what the effect would be
      • 2016-03-16 07658, 2016

      • kepstin
        right, so the sequential scan over highlevel is expected, there's no more efficient way to get all the rows in a table, and the problem is in the join itself.
      • 2016-03-16 07634, 2016

      • alastairp
        http://explain.depesz.com/s/jgKe here’s a plan with analyze, with a limit of 100000
      • 2016-03-16 07642, 2016

      • kepstin
        highlevel_ndx_highlevel_model is an index on the fk into the highlevel table and the model #, i assume?
      • 2016-03-16 07602, 2016

      • alastairp
        "model_ndx_highlevel_model" btree (model)
      • 2016-03-16 07605, 2016

      • alastairp
        just on the model field
      • 2016-03-16 07622, 2016

      • alastairp
        highlevel_model.model
      • 2016-03-16 07636, 2016

      • alastairp
      • 2016-03-16 07611, 2016

      • kepstin
        try adding an index on highlevel_model (highlevel, model) perhaps?
      • 2016-03-16 07616, 2016

      • alastairp
        hmm, right
      • 2016-03-16 07643, 2016

      • kepstin
        right now it's filtering by model then sorting both tables on highlevel id to do the merge. hmm.
      • 2016-03-16 07656, 2016

      • kepstin
        hmm.
      • 2016-03-16 07651, 2016

      • kepstin
        having the index on highlevel,model might result in the results from the model filter being returned in presorted order, which could speed up the join?
      • 2016-03-16 07653, 2016

      • kepstin
        not sure :)
      • 2016-03-16 07616, 2016

      • kepstin isn't a postgres expert, by far - he's trying to do his best rubber duck impression atm.
      • 2016-03-16 07623, 2016

      • alastairp
        building it now, let’s see
      • 2016-03-16 07650, 2016

      • alastairp
        (this is a big table, it might take some time :)
      • 2016-03-16 07613, 2016

      • alastairp
        settings have been tweaked, we have more shared buffers, etc
      • 2016-03-16 07648, 2016

      • kepstin
        hmm. adding that index might turn it into a nested loop instead of a merge join - scan the highlevel table then lookup each row in highlevel_model by (highlevel, model)
      • 2016-03-16 07656, 2016

      • kepstin
        no idea if the query planner will actually pick that
      • 2016-03-16 07603, 2016

      • kepstin
        or if it would be faster :)
      • 2016-03-16 07615, 2016

      • kepstin
        hmm. that merge join doesn't actually have any explicit sorts being added, so it looks like it should be optimal or close to it.
      • 2016-03-16 07617, 2016

      • alastairp
        right, this is why I was a little confused
      • 2016-03-16 07643, 2016

      • alastairp
        because the select without the join is really fast
      • 2016-03-16 07655, 2016

      • alastairp
        and the join didn’t seem to add anything strange in the query planner
      • 2016-03-16 07659, 2016

      • kepstin
        what's the speed of a simple select hlm.data->>'value' as genrea from highlevel_model as hlm where model=3 order by highlevel asc; look like?
      • 2016-03-16 07644, 2016

      • alastairp
        ah, with the order
      • 2016-03-16 07609, 2016

      • kepstin
        oh, i misunderstood your index names
      • 2016-03-16 07618, 2016

      • kepstin
        it's using "highlevel_ndx_highlevel_model" then filtering by model
      • 2016-03-16 07630, 2016

      • alastairp
        yeah, I’m not sure who came up with this name
      • 2016-03-16 07630, 2016

      • kepstin
        so it's actually reading *every row* to check the model
      • 2016-03-16 07638, 2016

      • alastairp
        but it’s field_ndx_tablename
      • 2016-03-16 07652, 2016

      • alastairp
        which seems odd to me
      • 2016-03-16 07605, 2016

      • kepstin
        so yeah, adding an index on (model, highlevel) or (highlevel, model) might help. Not sure which would help more.
      • 2016-03-16 07639, 2016

      • alastairp
        the index on highlevel,model just finished
      • 2016-03-16 07643, 2016

      • alastairp
        it’s not using it