#musicbrainz-devel

/

0:03 AM
ianmcorvidae

ianmcorvidae has changed the topic to: Self freezing week https://youtu.be/5T68TvdoSbI | http://musicbrainz.org/#devel | Agenda: Allowing murdos' bot to do more WD link edits (Freso), blog audience/dev blogging (ian)

2013-11-20 32455, 2013

0:10 AM
JonnyJD joined the channel

2013-11-20 32452, 2013

0:38 AM
ldmosquera joined the channel

2013-11-20 32408, 2013

0:39 AM
ldmosquera

hello all; question about the Virtualbox VM

2013-11-20 32436, 2013

0:39 AM
ldmosquera

I've downloaded the latest version from 2013-10-14, and followed the instructions to set it up

2013-11-20 32459, 2013

0:39 AM
ldmosquera

I'm using KVM instead of Virtualbox, but everything's good

2013-11-20 32408, 2013

0:40 AM
derwin

I think that won't actually work.

2013-11-20 32424, 2013

0:40 AM
derwin

because there's been a schema change since then, and upgrading is apparently not feasible?

2013-11-20 32438, 2013

0:40 AM
ianmcorvidae

no, that's the post-schema-change VM

2013-11-20 32440, 2013

0:40 AM
ianmcorvidae

see the date :P

2013-11-20 32453, 2013

0:40 AM
derwin

oh, didn't realize that the schema change was so long ago.

2013-11-20 32455, 2013

0:40 AM
ldmosquera

in any case my problem is with the reindex script

2013-11-20 32430, 2013

0:41 AM
ldmosquera

it always crashes either because of SEGFAULTs or different Java exceptions in different places

2013-11-20 32441, 2013

0:41 AM
ldmosquera

always during tmp_track

2013-11-20 32448, 2013

0:41 AM
ldmosquera

anyway came across this?

2013-11-20 32455, 2013

0:41 AM
derwin

what exception? OOM?

2013-11-20 32457, 2013

0:41 AM
ianmcorvidae

are you running replication while reindexing? I believe it's not intended to work while replication is on

2013-11-20 32425, 2013

0:42 AM
ldmosquera

didn't run any replication commands, is it on by default?

2013-11-20 32432, 2013

0:42 AM
ianmcorvidae

it shouldn't be, no

2013-11-20 32441, 2013

0:42 AM
ldmosquera

example: Exception in thread "main" java.lang.IncompatibleClassChangeError at org.apache.lucene.document.Document.add(Document.java:64)

2013-11-20 32412, 2013

0:43 AM
ldmosquera

another was a NullPointerException in another place in the code

2013-11-20 32419, 2013

0:43 AM
ianmcorvidae

strange

2013-11-20 32420, 2013

0:43 AM
ldmosquera

it would seem random

2013-11-20 32425, 2013

0:43 AM
ianmcorvidae

I wonder if there's something with java versions going on?

2013-11-20 32441, 2013

0:43 AM
ianmcorvidae

ijabz is the search server dev and ruaok is the one who has had the biggest role in setting up the VM images

2013-11-20 32447, 2013

0:43 AM
ianmcorvidae

neither of them seem to be around at present

2013-11-20 32448, 2013

0:43 AM
ldmosquera

I guess that would affect anyone with the same VM version

2013-11-20 32412, 2013

0:44 AM
ldmosquera

btw I'm using 6GB RAM so memory is not a problem

2013-11-20 32431, 2013

0:44 AM
ldmosquera

alright, I'll look out for them

2013-11-20 32413, 2013

0:45 AM
ldmosquera

also, another question

2013-11-20 32436, 2013

0:45 AM
ldmosquera

I'm seeing practically the same level of performance using a desktop harddisk and an SSD

2013-11-20 32450, 2013

0:45 AM
ldmosquera

always inside KVM using a raw LVM partition

2013-11-20 32405, 2013

0:46 AM
ianmcorvidae

performance for what exactly? website, webservice, search?

2013-11-20 32411, 2013

0:46 AM
ldmosquera

the indexing, sorry

2013-11-20 32425, 2013

0:46 AM
ianmcorvidae

hm

2013-11-20 32433, 2013

0:46 AM
ldmosquera

I guess Postgres is the bottleneck

2013-11-20 32442, 2013

0:46 AM
ianmcorvidae

not sure what the bottlenecks there are, but yeah, that'd be my guess

2013-11-20 32400, 2013

0:47 AM
ianmcorvidae

it could be memory, I suppose, with postgres

2013-11-20 32428, 2013

0:47 AM
ldmosquera

going from 2GB to 6GB for the VM made a bit of different but not as big as I'd expect

2013-11-20 32435, 2013

0:47 AM
ldmosquera

*difference

2013-11-20 32443, 2013

0:47 AM
ianmcorvidae

search server indexing builds a variety of temporary tables, and the automatic tuning only accounts for memory, not anything like SSD tuning

2013-11-20 32410, 2013

0:48 AM
ianmcorvidae

with an SSD you want it to much less sharply penalize random seeks and disk read/write operations, as you'd imagine

2013-11-20 32411, 2013

0:48 AM
ldmosquera

I looked around for Postgres tuning tips for SSD, but couldn't find much

2013-11-20 32424, 2013

0:48 AM
ianmcorvidae

I don't remember the exact parameters for that though

2013-11-20 32428, 2013

0:48 AM
ldmosquera

I'm using the deadline IO scheduler instead of the default CFQ

2013-11-20 32445, 2013

0:48 AM
ianmcorvidae

I suspect this is higher, in the postgres query planner

2013-11-20 32407, 2013

0:49 AM
ianmcorvidae

e.g. for an SSD you'd want it to consider materializing a temporary table much more often than you would with a spinning disk

2013-11-20 32417, 2013

0:50 AM
ianmcorvidae

looks like it's seq_page_cost and random_page_cost

2013-11-20 32427, 2013

0:50 AM
ianmcorvidae

http://www.postgresql.org/docs/current/static/run…

2013-11-20 32427, 2013

0:51 AM
ianmcorvidae

default is for seq_page_cost to be 1.0 and random_page_cost to be 4.0, with an SSD you might be able to squeeze some performance by kicking random_page_cost down some

2013-11-20 32429, 2013

0:51 AM
ldmosquera

I tried one of those, forgot which, but didn't see much change either; I'll read up more though

2013-11-20 32439, 2013

0:51 AM
ldmosquera

random_page_cost I believe

2013-11-20 32445, 2013

0:51 AM
ianmcorvidae

I don't necessarily know how much benefit you'll get from that though

2013-11-20 32455, 2013

0:51 AM
ldmosquera

thanks a lot! I'll do some tests

2013-11-20 32403, 2013

0:52 AM
ianmcorvidae

probably if you want to get more performance you'd need to ensure you know what the real bottleneck is and see what query plans it's getting

2013-11-20 32442, 2013

0:52 AM
ldmosquera

I'll just try some general tuning first

2013-11-20 32415, 2013

1:01 AM
ldmosquera

with a random_page_cost of 1.1 instead of the default 4.0, tmp_track took 144secs instead of 156secs

2013-11-20 32441, 2013

1:01 AM
ldmosquera

underwhelmed :P

2013-11-20 32447, 2013

1:01 AM
ianmcorvidae

heh

2013-11-20 32422, 2013

1:02 AM
ianmcorvidae

yeah, I don't really know -- it's possible the bottleneck is elsewhere, too

2013-11-20 32440, 2013

1:02 AM
ianmcorvidae

I haven't really played with any of this stuff running on SSDs, so :)

2013-11-20 32418, 2013

1:03 AM
derwin

but cmon, you have that 501c(3) cheese

2013-11-20 32426, 2013

1:03 AM
derwin

make it rain SSDs

2013-11-20 32402, 2013

1:04 AM
ianmcorvidae

heh

2013-11-20 32407, 2013

1:04 AM
ldmosquera

maybe (likely) the bottleneck is KVM / virtio

2013-11-20 32416, 2013

1:04 AM
ldmosquera

I'll try with different cache modes

2013-11-20 32447, 2013

1:04 AM
ianmcorvidae

I know that SSDs were considered for the new DB server we bought in 2011, but ultimately it was decided against, I think because the world of the internet wasn't quite sure how much benefit SSDs would bring

2013-11-20 32404, 2013

1:08 AM
ldmosquera

what hardware is MB running on nowadays?

2013-11-20 32438, 2013

1:09 AM
ianmcorvidae

we have a half-rack of servers doing various things; one DB server, hot-spare DB server, three machines running the website/webservice code, two machines running search servers, one machine building search indexes, frontend/gateway machines, and a variety of smaller things (e.g. our Xen host, we have VMs for the wiki, forums, and some other things)

2013-11-20 32457, 2013

1:09 AM
derwin

you could read the 2012 blog post..

2013-11-20 32402, 2013

1:10 AM
ianmcorvidae

http://metabrainz.org/doc/Annual_Report/2012#Serv… has a slightly-out-of-date list

2013-11-20 32451, 2013

1:10 AM
ldmosquera

nice! thanks

2013-11-20 32415, 2013

1:11 AM
ianmcorvidae

we've moved stimpy, dexter, and tails out of the rack and have a couple of new ones, at least one of which isn't just shut down as an ostensible future spare

2013-11-20 32435, 2013

1:12 AM
ianmcorvidae

heh, and hobbes is, I believe, currently sitting in our colo's fridge until we have a chance to get over there and open it up to replace some failing disks

2013-11-20 32443, 2013

1:14 AM
ldmosquera

the traffic graph is brutal

2013-11-20 32459, 2013

1:14 AM
ianmcorvidae

I think that graph hasn't been adjusted for ratelimited traffic, not sure

2013-11-20 32428, 2013

1:15 AM
ldmosquera

what happened in mid-2011? Maybe a new client software release?

2013-11-20 32445, 2013

1:15 AM
ianmcorvidae

headphones happened

2013-11-20 32459, 2013

1:15 AM
ldmosquera

figures :P that's exactly how I got here

2013-11-20 32404, 2013

1:16 AM
ianmcorvidae

which is a piece of software that our API is spectacularly bad for

2013-11-20 32417, 2013

1:16 AM
ianmcorvidae

so we ratelimit it really severely, which presumably is why you're setting up your own server :)

2013-11-20 32430, 2013

1:16 AM
ldmosquera

I recently discovered headphones, then beets through it, then I decided I needed my own MB mirror

2013-11-20 32440, 2013

1:16 AM
ianmcorvidae

http://stats.musicbrainz.org/mrtg/drraw/drraw.cgi… -- we refuse 2/3 of requests that come to us, from headphones

2013-11-20 32457, 2013

1:16 AM
ianmcorvidae

well, a bit less than that, but we still accept fewer than we let through :/

2013-11-20 32441, 2013

1:17 AM
derwin

oh ldmosquera I spoke with you last week!

2013-11-20 32459, 2013

1:17 AM
ianmcorvidae

http://stats.musicbrainz.org/mrtg/drraw/drraw.cgi… for pre-2012 traffic from headphones (by and large)

2013-11-20 32459, 2013

1:17 AM
ldmosquera

here? First time I hop in here

2013-11-20 32408, 2013

1:18 AM
derwin

in #musicbrainz..

2013-11-20 32413, 2013

1:18 AM
derwin

or #beets :)

2013-11-20 32449, 2013

1:18 AM
ldmosquera

are you sure it was me? I haven't registered this nick, maybe someone else named like this (incredibly unlikely)

2013-11-20 32401, 2013

1:20 AM
derwin

guess it must have been someone with a similar path to musicbrainz

2013-11-20 32416, 2013

1:20 AM
ianmcorvidae

it's a pretty common one lately

2013-11-20 32424, 2013

1:20 AM
ianmcorvidae

especially for people setting up servers

2013-11-20 32441, 2013

1:22 AM
ldmosquera

I absolutely love MB; I built some scripts to "curate" my collections a few years ago, but it was a heap of manual work

2013-11-20 32405, 2013

1:23 AM
ldmosquera

the scripts inferred stuff and made suggestions, but I had to review everything

2013-11-20 32423, 2013

1:23 AM
ldmosquera

now I found beets and it manages to do 90% of it without input

2013-11-20 32444, 2013

1:23 AM
derwin

yeah, #beets exists btw, and is active, in case you need help :)

2013-11-20 32415, 2013

1:24 AM
ldmosquera

not so far; it's gloriously well made and I had no suprises

2013-11-20 32456, 2013

1:28 AM
ldmosquera

ianmcorvidae: how do you mean MB's API is bad for Headphones?

2013-11-20 32415, 2013

1:29 AM
ianmcorvidae

headphones tends to have to make a lot of requests in order to get the information it wants

2013-11-20 32458, 2013

1:29 AM
ldmosquera

one per track or something like that?

2013-11-20 32407, 2013

1:30 AM
ianmcorvidae

we don't really have much in the way of tools for synchronizing changes, as it were -- most of our API requires you to specify one entity at a time, and polling is really the only way to watch for changes to the data

2013-11-20 32445, 2013

1:30 AM
ldmosquera

oh I see

2013-11-20 32450, 2013

1:30 AM
ianmcorvidae

headphones has done some decent work improving that -- for example by using complicated hacks with things like search queries to get around the one-entity limits

2013-11-20 32459, 2013

1:30 AM
ianmcorvidae

but it's just still really not good for that

2013-11-20 32420, 2013

1:31 AM
ianmcorvidae

the MB API grew up around taggers, and that means that sometimes it's not good for things that don't match that pattern of usage

2013-11-20 32456, 2013

1:31 AM
ianmcorvidae

(with a tagger, it makes a lot of sense: you request one release at a time, and updating to account for changes is largely manual, not automated)

2013-11-20 32458, 2013

1:32 AM
ianmcorvidae

theoretically headphones could even do something semi-crazy like use replication packets, but that wouldn't help with the bits of headphones that are passing to beets and thus require a copy of the MB API

2013-11-20 32423, 2013

1:33 AM
ianmcorvidae

so the usual way of doing things seems to have become "set up a mirror"

2013-11-20 32432, 2013

1:33 AM
ianmcorvidae

it's at least gotten us to be better about releasing updated VMs :)

2013-11-20 32409, 2013

1:34 AM
ldmosquera

nice job downscaling everything into a single VM!

2013-11-20 32459, 2013

1:37 AM
ldmosquera

so basically Headphones operates on the entire collection instead of file by file like a tagger, and so ends up doing many requests for each file, right?

2013-11-20 32422, 2013

1:38 AM
ldmosquera

and thus would benefit from some kind of batch-mode API

2013-11-20 32439, 2013

1:38 AM
ianmcorvidae

well, a batch-mode API would mean that it could make fewer requests as a matter of polling

2013-11-20 32455, 2013

1:38 AM
ianmcorvidae

what would really help is if we had an effective way to push out changes

2013-11-20 32423, 2013

1:39 AM
ianmcorvidae

i.e., so headphones can watch something and then only make requests for things that have actually changed, rather than polling to see if there are changes

2013-11-20 32446, 2013

1:39 AM
ldmosquera

got it

2013-11-20 32449, 2013

1:39 AM
ianmcorvidae

we have a partially-done experiment in that, but it has a lot of weaknesses and we're perpetually short on resources to work on things, so

2013-11-20 32415, 2013

1:40 AM
ldmosquera

maybe if you could specify "releases newer than date XXX"

2013-11-20 32442, 2013

1:40 AM
ianmcorvidae

what would be fantastic for headphones is if it could just make a request every so often saying "hey, I care about these artist MBIDs, which ones have new releases/release groups?", get back a list of MBIDs, and then request only those

2013-11-20 32453, 2013

1:40 AM
ianmcorvidae

(where "new" would be defined in terms of some date, like you say)

2013-11-20 32410, 2013

1:41 AM
ianmcorvidae

our API also allows a lot of different representations/granularities to the data, though

2013-11-20 32409, 2013

1:42 AM
ianmcorvidae

which makes it hard; such a changed-entities thing would either have to assume that everyone only cares about one particular one of those resolutions/representations, or it needs a way to specify exactly what things a given client cares about (and then it needs to keep track of more data so it can accurately respond to those requests)

2013-11-20 32451, 2013

1:42 AM
ldmosquera

what are the resolutions, for example?

2013-11-20 32432, 2013

1:43 AM
ianmcorvidae

so if you look at http://wiki.musicbrainz.org/XML_Web_Service/Versi… and the followingthree sections, those are the various so-called 'inc parameters'

2013-11-20 32437, 2013

1:43 AM
ianmcorvidae

which specify which pieces of data to include

2013-11-20 32437, 2013

1:44 AM
ldmosquera

got it

2013-11-20 32450, 2013

1:44 AM
ianmcorvidae

and in some cases combining two inc parameters is not just a matter of merging the two, since sometimes one inc parameter will also affect the data returned by another

2013-11-20 32402, 2013

1:45 AM
ianmcorvidae

(especially those listed in "inc= arguments which affect subqueries", but)

2013-11-20 32402, 2013

1:46 AM
ianmcorvidae

(e.g. for a release, inc=artist-credits will include the release artist credit, inc=recordings will include the tracks on the release, but inc=artist-credits+recordings will include the release artist credit, the tracks, and all of the tracks' artist credits

2013-11-20 32406, 2013

1:46 AM
ianmcorvidae

)

2013-11-20 32411, 2013

1:47 AM
ianmcorvidae

we don't have very good internal caching/tracking of changes to data returned by the WS, too -- for HTTP caching stuff we can basically never avoid doing all the work, database-wise, before knowing if the response has changed

2013-11-20 32427, 2013

1:47 AM
ianmcorvidae

which, again, partly-finished experiments exist, but :)

2013-11-20 32442, 2013

1:49 AM
ldmosquera

so the problem would be to make it generic so it could satisfy any client without assuming things like what Headphones needs

2013-11-20 32457, 2013

1:49 AM
ianmcorvidae

yeah

2013-11-20 32433, 2013

1:50 AM
ianmcorvidae

also getting headphones to use it, which can sometimes be a struggle, but if it were well-made enough I guess we'd hope the benefits were self-evident :)

2013-11-20 32452, 2013

1:50 AM
ldmosquera

if Headphones is overwhelmingly more active than other clients, then maybe it'd pay to make just this API endpoint for it

2013-11-20 32437, 2013

1:51 AM
ldmosquera

then other clients would probably catch on

2013-11-20 32454, 2013

1:52 AM
ldmosquera

I see :)

2013-11-20 32412, 2013

1:53 AM
ldmosquera

I also use muspy, which I believe uses MusicBrainz too

2013-11-20 32420, 2013

1:53 AM
ldmosquera

how does it fare with the API?

2013-11-20 32430, 2013

1:53 AM
ianmcorvidae

muspy does it a bit better, because it essentially functions as an aggregator

2013-11-20 32438, 2013

1:53 AM
ldmosquera

right, centralized

2013-11-20 32443, 2013

1:53 AM
ianmcorvidae

if 3000 people all follow the same artist on muspy it still only has to make one request to us per day

2013-11-20 32446, 2013

1:53 AM
ianmcorvidae

yeah

2013-11-20 32427, 2013

1:54 AM
ianmcorvidae

muspy is something that it wouldn't be unreasonable for us to copy, in a rough sense, for the sort of changed-data API/feed I was talking about

2013-11-20 32407, 2013

1:56 AM
ldmosquera

gotta run for few hours, but I'll be back

2013-11-20 32416, 2013

1:56 AM
ianmcorvidae

cool, nice talking to you

2013-11-20 32422, 2013

1:56 AM
ianmcorvidae

hopefully you get your issues sorted

2013-11-20 32425, 2013

1:56 AM
ldmosquera

hopefully I can make myself useful :) I'm a developer and sysadmin