#metabrainz

/

16:14 PM
Gentlecat

that makes sense to me

2016-03-16 07608, 2016

16:15 PM
CallerNo6

cool. I was thinking of doing it to make LordSputnik happy anyway.

2016-03-16 07644, 2016

16:17 PM
JesseW has quit

2016-03-16 07609, 2016

16:23 PM
regagain joined the channel

2016-03-16 07616, 2016

16:37 PM
armalcolite has quit

2016-03-16 07618, 2016

16:46 PM
CallerNo6

okey dokey, first version of cleaned up gsoc page @ https://wiki.musicbrainz.org/Development/Summer_o…

2016-03-16 07600, 2016

16:47 PM
CallerNo6

(sub-pages are a mishmash of redirects, transclusion and copy/paste, will try to standardize them later)

2016-03-16 07614, 2016

16:47 PM
Gentlecat

looks good!

2016-03-16 07603, 2016

17:09 PM
outsidecontext has quit

2016-03-16 07621, 2016

17:09 PM
outsidecontext joined the channel

2016-03-16 07643, 2016

17:17 PM
manu-chroma has quit

2016-03-16 07647, 2016

17:17 PM
typhoe has quit

2016-03-16 07645, 2016

17:18 PM
typhoe joined the channel

2016-03-16 07604, 2016

17:20 PM
armalcolite joined the channel

2016-03-16 07619, 2016

17:21 PM
armalcolite

any suggestions on where to begin again? currently just aiming for improving the proposal...

2016-03-16 07604, 2016

17:22 PM
armalcolite

alastairp: should i just keep the project scope to api, as u suggested on draft?

2016-03-16 07630, 2016

17:22 PM
UmkaDK has quit

2016-03-16 07643, 2016

17:22 PM
reosarevok

Sigh. Having to tell people "sorry but you're SOOL" sucks

2016-03-16 07642, 2016

17:23 PM
Gentlecat

bitmap: did you mean "release_group_primary_type" instead of "release_group_type" in http://tickets.musicbrainz.org/browse/MBS-8838

2016-03-16 07613, 2016

17:25 PM
UmkaDK joined the channel

2016-03-16 07626, 2016

17:25 PM
ruaok

reosarevok: I wasn't suggesting to shortcut the normal style process.

2016-03-16 07634, 2016

17:25 PM
ruaok

that is why I said "start the process". :p

2016-03-16 07637, 2016

17:25 PM
reosarevok

I know, but just making it clear for everyone :)

2016-03-16 07643, 2016

17:25 PM
ruaok

k

2016-03-16 07623, 2016

17:27 PM
Gentlecat

armalcolite: did you come up with a schedule?

2016-03-16 07643, 2016

17:27 PM
Gentlecat

only you can decide what you want to do

2016-03-16 07608, 2016

17:28 PM
armalcolite

yeah. will post it by tomorrow.

2016-03-16 07618, 2016

17:28 PM
armalcolite

doing some edits.

2016-03-16 07620, 2016

17:30 PM
Mineo joined the channel

2016-03-16 07644, 2016

17:30 PM
diana_olhovyk joined the channel

2016-03-16 07649, 2016

17:33 PM
alastairp

armalcolite: I don't think that's exactly what I suggested

2016-03-16 07606, 2016

17:34 PM
alastairp

I think for the API, the scope should just be the submission part

2016-03-16 07629, 2016

17:34 PM
armalcolite

i get it now.

2016-03-16 07634, 2016

17:34 PM
alastairp

you also wanted a part for the charts too?

2016-03-16 07642, 2016

17:34 PM
armalcolite

yeah.

2016-03-16 07649, 2016

17:34 PM
alastairp

send your submission with both parts

2016-03-16 07606, 2016

17:35 PM
alastairp

we can refine the submission over the next weeks

2016-03-16 07623, 2016

17:35 PM
armalcolite

i am not sure how much charts will take, so i am currently planning to add it to optional.

2016-03-16 07634, 2016

17:35 PM
alastairp

yes, I don't know how much they will take either

2016-03-16 07651, 2016

17:35 PM
alastairp

I'm thinking about it too. I'll discuss with ruaok in person later this week

2016-03-16 07614, 2016

17:36 PM
armalcolite

sure. i will post a timeline of the same probably by tomorrow.

2016-03-16 07630, 2016

17:36 PM
armalcolite

and a updated draft. :)

2016-03-16 07632, 2016

17:38 PM
ruaok

alastairp: the more I think about it, the more I like the idea of using PG for LB for the short-term.

2016-03-16 07655, 2016

17:38 PM
ruaok

that makes it easier for anyone to participate in the short term. we learn what usage patterns we have.

2016-03-16 07613, 2016

17:39 PM
ruaok

if we need to go back, we can go back with a clear set of ideas of what is needed.

2016-03-16 07609, 2016

17:40 PM
alastairp

hmm. interesting

2016-03-16 07609, 2016

17:40 PM
alastairp

OK

2016-03-16 07630, 2016

17:40 PM
alastairp

let's talk about it then

2016-03-16 07653, 2016

17:40 PM
alastairp

my guess is that we'd switch to it, and then never move off it because it's "easy and everyone knows how to use it"

2016-03-16 07602, 2016

17:41 PM
alastairp

and then run into scalability problems :)

2016-03-16 07621, 2016

17:42 PM
Gentlecat

why do we need to switch? because we can't do things that we need? or because we don't know how?

2016-03-16 07630, 2016

17:42 PM
alastairp

a bit of both

2016-03-16 07633, 2016

17:42 PM
ruaok

Gentlecat: #2

2016-03-16 07656, 2016

17:42 PM
ruaok

cassandra has so many "this is ok, unless you push too hard over then. then you're fucked"

2016-03-16 07601, 2016

17:43 PM
ruaok

we understand PG.

2016-03-16 07619, 2016

17:43 PM
alastairp

#1 e.g. we can't do track/artist stats because cassandra is keyed on user

2016-03-16 07630, 2016

17:43 PM
ruaok

alastairp: good fear to have, but we just need to reinforce a MO of keeping an impending move in mind.

2016-03-16 07657, 2016

17:43 PM
ruaok

oh, of options, not questions, yes #1.

2016-03-16 07628, 2016

17:44 PM
ruaok

I personally feel that cassandra is a big (at least mental) bottleneck.

2016-03-16 07639, 2016

17:44 PM
ruaok

I don't know how to do this. soI don't do it.

2016-03-16 07646, 2016

17:44 PM
ruaok

this, that, that over there.

2016-03-16 07659, 2016

17:44 PM
ruaok

all comes with, sit down and spend a few days learning cassandra

2016-03-16 07605, 2016

17:45 PM
ruaok

s/with/to

2016-03-16 07618, 2016

17:45 PM
Gentlecat

but there's probably a reason it was chosen, no?

2016-03-16 07620, 2016

17:45 PM
ruaok leaves the keyboard and solders shit.

2016-03-16 07633, 2016

17:45 PM
ruaok

oh, wait alastairp: hit me with the link to the book stand again.

2016-03-16 07638, 2016

17:45 PM
ruaok

the printer is idle.

2016-03-16 07621, 2016

17:47 PM
ruaok

http://www.thingiverse.com/thing:1021025 ?

2016-03-16 07656, 2016

17:48 PM
bitmap

Gentlecat: whoops, yes

2016-03-16 07607, 2016

17:49 PM
ruaok

wow. 4 hours.

2016-03-16 07616, 2016

17:49 PM
ruaok

ok, I'll wait until bedtime for that one

2016-03-16 07636, 2016

17:50 PM
kepstin is kind of sad that the example ids in http://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_music_service_c.html aren't real mbids ;)

2016-03-16 07642, 2016

18:02 PM
diana_olhovyk has quit

2016-03-16 07606, 2016

18:06 PM
alastairp

ruaok: yes, that’s it

2016-03-16 07600, 2016

18:07 PM
justharshal joined the channel

2016-03-16 07627, 2016

18:09 PM
ruaok

perfect. I crammed 6 hours worth of stuff onto the platform.

2016-03-16 07632, 2016

18:09 PM
ruaok

will hit print before bed.

2016-03-16 07659, 2016

18:09 PM
alastairp

cool

2016-03-16 07611, 2016

18:10 PM
alastairp

it’ll do more than 1 at a time if it fits?

2016-03-16 07623, 2016

18:12 PM
ruaok

yeah.

2016-03-16 07646, 2016

18:12 PM
ruaok

the print head for the next piece needs to be clear of it, but it uses a painters algorithm to figure it out.

2016-03-16 07603, 2016

18:28 PM
Jormangeud has quit

2016-03-16 07648, 2016

18:28 PM
Jormangeud joined the channel

2016-03-16 07640, 2016

18:33 PM
alastairp

hmm

2016-03-16 07601, 2016

18:34 PM
Jormangeud has quit

2016-03-16 07656, 2016

18:34 PM
Jormangeud joined the channel

2016-03-16 07633, 2016

18:35 PM
alastairp

joining onto our highlevel model table is slooow

2016-03-16 07652, 2016

18:37 PM
alastairp

100% cpu :(

2016-03-16 07615, 2016

18:41 PM
kanha has quit

2016-03-16 07655, 2016

18:43 PM
kanha joined the channel

2016-03-16 07601, 2016

19:11 PM
CallerNo6

legoktm, when I edit an interwiki entry, does some cache have to be refreshed or something?

2016-03-16 07606, 2016

19:11 PM
CallerNo6 keeps making dumb typos

2016-03-16 07649, 2016

19:11 PM
legoktm

was it already used in a page?

2016-03-16 07659, 2016

19:11 PM
CallerNo6

yeah

2016-03-16 07608, 2016

19:12 PM
legoktm

you'll have to make a null edit (open edit window and hit save without changing anything)

2016-03-16 07625, 2016

19:12 PM
CallerNo6

ah. cool, thanks!

2016-03-16 07630, 2016

19:12 PM
legoktm

if it's a multiple pages I can clear the cache server-side

2016-03-16 07628, 2016

19:13 PM
CallerNo6

Should just be the one. I'll try to bug you as little as possible :-)

2016-03-16 07649, 2016

19:13 PM
alastairp

so, postgres guys

2016-03-16 07607, 2016

19:14 PM
alastairp

joining onto a 50m row table. good or bad idea?

2016-03-16 07613, 2016

19:17 PM
kepstin

depends on how your indexes are set up, mostly, i'd think...

2016-03-16 07624, 2016

19:17 PM
alastairp

yeah

2016-03-16 07644, 2016

19:17 PM
alastairp

we have a main table, which has a pk and mbid (3m rows)

2016-03-16 07624, 2016

19:18 PM
kepstin

should probably look at the query plan that was generated, see if there's anything obviously odd in it.

2016-03-16 07625, 2016

19:18 PM
alastairp

and a data table which has 42m rows, 14 for each in the main table, with a fk, id 1-14, and jsonb data

2016-03-16 07647, 2016

19:18 PM
alastairp

http://explain.depesz.com/s/dTN

2016-03-16 07646, 2016

19:19 PM
alastairp

I’m wanting to get a csv dump of all the data. dumping the data table with a where, and referencing into the jsonb is as quick as I expected

2016-03-16 07607, 2016

19:20 PM
kepstin

what's the query you're running for that?

2016-03-16 07608, 2016

19:20 PM
alastairp

about 1000 rows per second

2016-03-16 07627, 2016

19:20 PM
alastairp

select hl.mbid, hlm.data->>'value' as genre from highlevel_model hlm join highlevel hl on hlm.highlevel=hl.id where model=3

2016-03-16 07655, 2016

19:20 PM
alastairp

as soon as I put the join in, it takes about 30 seconds per 100 rows

2016-03-16 07631, 2016

19:21 PM
kepstin

hmm. you're extracting this information for every entry, rather than only a subset of them?

2016-03-16 07601, 2016

19:22 PM
alastairp

hlm.model=3

2016-03-16 07622, 2016

19:22 PM
alastairp

it’ll be 1 row for every item in highlevel, but only 1/14 for highlevel_model

2016-03-16 07639, 2016

19:22 PM
alastairp

oh, I reversed the from/join in this one to see what the effect would be

2016-03-16 07658, 2016

19:23 PM
kepstin

right, so the sequential scan over highlevel is expected, there's no more efficient way to get all the rows in a table, and the problem is in the join itself.

2016-03-16 07634, 2016

19:24 PM
alastairp

http://explain.depesz.com/s/jgKe here’s a plan with analyze, with a limit of 100000

2016-03-16 07642, 2016

19:25 PM
kepstin

highlevel_ndx_highlevel_model is an index on the fk into the highlevel table and the model #, i assume?

2016-03-16 07602, 2016

19:26 PM
alastairp

"model_ndx_highlevel_model" btree (model)

2016-03-16 07605, 2016

19:26 PM
alastairp

just on the model field

2016-03-16 07622, 2016

19:26 PM
alastairp

highlevel_model.model

2016-03-16 07636, 2016

19:26 PM
alastairp

https://github.com/metabrainz/acousticbrainz-serv…

2016-03-16 07611, 2016

19:27 PM
kepstin

try adding an index on highlevel_model (highlevel, model) perhaps?

2016-03-16 07616, 2016

19:28 PM
alastairp

hmm, right

2016-03-16 07643, 2016

19:29 PM
kepstin

right now it's filtering by model then sorting both tables on highlevel id to do the merge. hmm.

2016-03-16 07656, 2016

19:29 PM
kepstin

hmm.

2016-03-16 07651, 2016

19:30 PM
kepstin

having the index on highlevel,model might result in the results from the model filter being returned in presorted order, which could speed up the join?

2016-03-16 07653, 2016

19:30 PM
kepstin

not sure :)

2016-03-16 07616, 2016

19:31 PM
kepstin isn't a postgres expert, by far - he's trying to do his best rubber duck impression atm.

2016-03-16 07623, 2016

19:31 PM
alastairp

building it now, let’s see

2016-03-16 07650, 2016

19:31 PM
alastairp

(this is a big table, it might take some time :)

2016-03-16 07613, 2016

19:32 PM
alastairp

settings have been tweaked, we have more shared buffers, etc

2016-03-16 07648, 2016

19:33 PM
kepstin

hmm. adding that index might turn it into a nested loop instead of a merge join - scan the highlevel table then lookup each row in highlevel_model by (highlevel, model)

2016-03-16 07656, 2016

19:33 PM
kepstin

no idea if the query planner will actually pick that

2016-03-16 07603, 2016

19:34 PM
kepstin

or if it would be faster :)

2016-03-16 07615, 2016

19:36 PM
kepstin

hmm. that merge join doesn't actually have any explicit sorts being added, so it looks like it should be optimal or close to it.

2016-03-16 07617, 2016

19:37 PM
alastairp

right, this is why I was a little confused

2016-03-16 07643, 2016

19:37 PM
alastairp

because the select without the join is really fast

2016-03-16 07655, 2016

19:37 PM
alastairp

and the join didn’t seem to add anything strange in the query planner

2016-03-16 07659, 2016

19:38 PM
kepstin

what's the speed of a simple select hlm.data->>'value' as genrea from highlevel_model as hlm where model=3 order by highlevel asc; look like?

2016-03-16 07644, 2016

19:39 PM
alastairp

ah, with the order

2016-03-16 07609, 2016

19:41 PM
kepstin

oh, i misunderstood your index names

2016-03-16 07618, 2016

19:41 PM
kepstin

it's using "highlevel_ndx_highlevel_model" then filtering by model

2016-03-16 07630, 2016

19:41 PM
alastairp

yeah, I’m not sure who came up with this name

2016-03-16 07630, 2016

19:41 PM
kepstin

so it's actually reading *every row* to check the model

2016-03-16 07638, 2016

19:41 PM
alastairp

but it’s field_ndx_tablename

2016-03-16 07652, 2016

19:41 PM
alastairp

which seems odd to me

2016-03-16 07605, 2016

19:42 PM
kepstin

so yeah, adding an index on (model, highlevel) or (highlevel, model) might help. Not sure which would help more.

2016-03-16 07639, 2016

19:42 PM
alastairp

the index on highlevel,model just finished

2016-03-16 07643, 2016

19:42 PM
alastairp

it’s not using it