default is for seq_page_cost to be 1.0 and random_page_cost to be 4.0, with an SSD you might be able to squeeze some performance by kicking random_page_cost down some
2013-11-20 32429, 2013
ldmosquera
I tried one of those, forgot which, but didn't see much change either; I'll read up more though
2013-11-20 32439, 2013
ldmosquera
random_page_cost I believe
2013-11-20 32445, 2013
ianmcorvidae
I don't necessarily know how much benefit you'll get from that though
2013-11-20 32455, 2013
ldmosquera
thanks a lot! I'll do some tests
2013-11-20 32403, 2013
ianmcorvidae
probably if you want to get more performance you'd need to ensure you know what the real bottleneck is and see what query plans it's getting
2013-11-20 32442, 2013
ldmosquera
I'll just try some general tuning first
2013-11-20 32415, 2013
ldmosquera
with a random_page_cost of 1.1 instead of the default 4.0, tmp_track took 144secs instead of 156secs
2013-11-20 32441, 2013
ldmosquera
underwhelmed :P
2013-11-20 32447, 2013
ianmcorvidae
heh
2013-11-20 32422, 2013
ianmcorvidae
yeah, I don't really know -- it's possible the bottleneck is elsewhere, too
2013-11-20 32440, 2013
ianmcorvidae
I haven't really played with any of this stuff running on SSDs, so :)
2013-11-20 32418, 2013
derwin
but cmon, you have that 501c(3) cheese
2013-11-20 32426, 2013
derwin
make it rain SSDs
2013-11-20 32402, 2013
ianmcorvidae
heh
2013-11-20 32407, 2013
ldmosquera
maybe (likely) the bottleneck is KVM / virtio
2013-11-20 32416, 2013
ldmosquera
I'll try with different cache modes
2013-11-20 32447, 2013
ianmcorvidae
I know that SSDs were considered for the new DB server we bought in 2011, but ultimately it was decided against, I think because the world of the internet wasn't quite sure how much benefit SSDs would bring
2013-11-20 32404, 2013
ldmosquera
what hardware is MB running on nowadays?
2013-11-20 32438, 2013
ianmcorvidae
we have a half-rack of servers doing various things; one DB server, hot-spare DB server, three machines running the website/webservice code, two machines running search servers, one machine building search indexes, frontend/gateway machines, and a variety of smaller things (e.g. our Xen host, we have VMs for the wiki, forums, and some other things)
we've moved stimpy, dexter, and tails out of the rack and have a couple of new ones, at least one of which isn't just shut down as an ostensible future spare
2013-11-20 32435, 2013
ianmcorvidae
heh, and hobbes is, I believe, currently sitting in our colo's fridge until we have a chance to get over there and open it up to replace some failing disks
2013-11-20 32443, 2013
ldmosquera
the traffic graph is brutal
2013-11-20 32459, 2013
ianmcorvidae
I think that graph hasn't been adjusted for ratelimited traffic, not sure
2013-11-20 32428, 2013
ldmosquera
what happened in mid-2011? Maybe a new client software release?
2013-11-20 32445, 2013
ianmcorvidae
headphones happened
2013-11-20 32459, 2013
ldmosquera
figures :P that's exactly how I got here
2013-11-20 32404, 2013
ianmcorvidae
which is a piece of software that our API is spectacularly bad for
2013-11-20 32417, 2013
ianmcorvidae
so we ratelimit it really severely, which presumably is why you're setting up your own server :)
2013-11-20 32430, 2013
ldmosquera
I recently discovered headphones, then beets through it, then I decided I needed my own MB mirror
are you sure it was me? I haven't registered this nick, maybe someone else named like this (incredibly unlikely)
2013-11-20 32401, 2013
derwin
guess it must have been someone with a similar path to musicbrainz
2013-11-20 32416, 2013
ianmcorvidae
it's a pretty common one lately
2013-11-20 32424, 2013
ianmcorvidae
especially for people setting up servers
2013-11-20 32441, 2013
ldmosquera
I absolutely love MB; I built some scripts to "curate" my collections a few years ago, but it was a heap of manual work
2013-11-20 32405, 2013
ldmosquera
the scripts inferred stuff and made suggestions, but I had to review everything
2013-11-20 32423, 2013
ldmosquera
now I found beets and it manages to do 90% of it without input
2013-11-20 32444, 2013
derwin
yeah, #beets exists btw, and is active, in case you need help :)
2013-11-20 32415, 2013
ldmosquera
not so far; it's gloriously well made and I had no suprises
2013-11-20 32456, 2013
ldmosquera
ianmcorvidae: how do you mean MB's API is bad for Headphones?
2013-11-20 32415, 2013
ianmcorvidae
headphones tends to have to make a lot of requests in order to get the information it wants
2013-11-20 32458, 2013
ldmosquera
one per track or something like that?
2013-11-20 32407, 2013
ianmcorvidae
we don't really have much in the way of tools for synchronizing changes, as it were -- most of our API requires you to specify one entity at a time, and polling is really the only way to watch for changes to the data
2013-11-20 32445, 2013
ldmosquera
oh I see
2013-11-20 32450, 2013
ianmcorvidae
headphones has done some decent work improving that -- for example by using complicated hacks with things like search queries to get around the one-entity limits
2013-11-20 32459, 2013
ianmcorvidae
but it's just still really not good for that
2013-11-20 32420, 2013
ianmcorvidae
the MB API grew up around taggers, and that means that sometimes it's not good for things that don't match that pattern of usage
2013-11-20 32456, 2013
ianmcorvidae
(with a tagger, it makes a lot of sense: you request one release at a time, and updating to account for changes is largely manual, not automated)
2013-11-20 32458, 2013
ianmcorvidae
theoretically headphones could even do something semi-crazy like use replication packets, but that wouldn't help with the bits of headphones that are passing to beets and thus require a copy of the MB API
2013-11-20 32423, 2013
ianmcorvidae
so the usual way of doing things seems to have become "set up a mirror"
2013-11-20 32432, 2013
ianmcorvidae
it's at least gotten us to be better about releasing updated VMs :)
2013-11-20 32409, 2013
ldmosquera
nice job downscaling everything into a single VM!
2013-11-20 32459, 2013
ldmosquera
so basically Headphones operates on the entire collection instead of file by file like a tagger, and so ends up doing many requests for each file, right?
2013-11-20 32422, 2013
ldmosquera
and thus would benefit from some kind of batch-mode API
2013-11-20 32439, 2013
ianmcorvidae
well, a batch-mode API would mean that it could make fewer requests as a matter of polling
2013-11-20 32455, 2013
ianmcorvidae
what would really help is if we had an effective way to push out changes
2013-11-20 32423, 2013
ianmcorvidae
i.e., so headphones can watch something and then only make requests for things that have actually changed, rather than polling to see if there are changes
2013-11-20 32446, 2013
ldmosquera
got it
2013-11-20 32449, 2013
ianmcorvidae
we have a partially-done experiment in that, but it has a lot of weaknesses and we're perpetually short on resources to work on things, so
2013-11-20 32415, 2013
ldmosquera
maybe if you could specify "releases newer than date XXX"
2013-11-20 32442, 2013
ianmcorvidae
what would be fantastic for headphones is if it could just make a request every so often saying "hey, I care about these artist MBIDs, which ones have new releases/release groups?", get back a list of MBIDs, and then request only those
2013-11-20 32453, 2013
ianmcorvidae
(where "new" would be defined in terms of some date, like you say)
2013-11-20 32410, 2013
ianmcorvidae
our API also allows a lot of different representations/granularities to the data, though
2013-11-20 32409, 2013
ianmcorvidae
which makes it hard; such a changed-entities thing would either have to assume that everyone only cares about one particular one of those resolutions/representations, or it needs a way to specify exactly what things a given client cares about (and then it needs to keep track of more data so it can accurately respond to those requests)
and in some cases combining two inc parameters is not just a matter of merging the two, since sometimes one inc parameter will also affect the data returned by another
2013-11-20 32402, 2013
ianmcorvidae
(especially those listed in "inc= arguments which affect subqueries", but)
2013-11-20 32402, 2013
ianmcorvidae
(e.g. for a release, inc=artist-credits will include the release artist credit, inc=recordings will include the tracks on the release, but inc=artist-credits+recordings will include the release artist credit, the tracks, and all of the tracks' artist credits
2013-11-20 32406, 2013
ianmcorvidae
)
2013-11-20 32411, 2013
ianmcorvidae
we don't have very good internal caching/tracking of changes to data returned by the WS, too -- for HTTP caching stuff we can basically never avoid doing all the work, database-wise, before knowing if the response has changed
2013-11-20 32427, 2013
ianmcorvidae
which, again, partly-finished experiments exist, but :)
2013-11-20 32442, 2013
ldmosquera
so the problem would be to make it generic so it could satisfy any client without assuming things like what Headphones needs
2013-11-20 32457, 2013
ianmcorvidae
yeah
2013-11-20 32433, 2013
ianmcorvidae
also getting headphones to use it, which can sometimes be a struggle, but if it were well-made enough I guess we'd hope the benefits were self-evident :)
2013-11-20 32452, 2013
ldmosquera
if Headphones is overwhelmingly more active than other clients, then maybe it'd pay to make just this API endpoint for it
2013-11-20 32437, 2013
ldmosquera
then other clients would probably catch on
2013-11-20 32454, 2013
ldmosquera
I see :)
2013-11-20 32412, 2013
ldmosquera
I also use muspy, which I believe uses MusicBrainz too
2013-11-20 32420, 2013
ldmosquera
how does it fare with the API?
2013-11-20 32430, 2013
ianmcorvidae
muspy does it a bit better, because it essentially functions as an aggregator
2013-11-20 32438, 2013
ldmosquera
right, centralized
2013-11-20 32443, 2013
ianmcorvidae
if 3000 people all follow the same artist on muspy it still only has to make one request to us per day
2013-11-20 32446, 2013
ianmcorvidae
yeah
2013-11-20 32427, 2013
ianmcorvidae
muspy is something that it wouldn't be unreasonable for us to copy, in a rough sense, for the sort of changed-data API/feed I was talking about
2013-11-20 32407, 2013
ldmosquera
gotta run for few hours, but I'll be back
2013-11-20 32416, 2013
ianmcorvidae
cool, nice talking to you
2013-11-20 32422, 2013
ianmcorvidae
hopefully you get your issues sorted
2013-11-20 32425, 2013
ldmosquera
hopefully I can make myself useful :) I'm a developer and sysadmin