default is for seq_page_cost to be 1.0 and random_page_cost to be 4.0, with an SSD you might be able to squeeze some performance by kicking random_page_cost down some
ldmosquera
I tried one of those, forgot which, but didn't see much change either; I'll read up more though
random_page_cost I believe
ianmcorvidae
I don't necessarily know how much benefit you'll get from that though
ldmosquera
thanks a lot! I'll do some tests
ianmcorvidae
probably if you want to get more performance you'd need to ensure you know what the real bottleneck is and see what query plans it's getting
ldmosquera
I'll just try some general tuning first
with a random_page_cost of 1.1 instead of the default 4.0, tmp_track took 144secs instead of 156secs
underwhelmed :P
ianmcorvidae
heh
yeah, I don't really know -- it's possible the bottleneck is elsewhere, too
I haven't really played with any of this stuff running on SSDs, so :)
derwin
but cmon, you have that 501c(3) cheese
make it rain SSDs
ianmcorvidae
heh
ldmosquera
maybe (likely) the bottleneck is KVM / virtio
I'll try with different cache modes
ianmcorvidae
I know that SSDs were considered for the new DB server we bought in 2011, but ultimately it was decided against, I think because the world of the internet wasn't quite sure how much benefit SSDs would bring
ldmosquera
what hardware is MB running on nowadays?
ianmcorvidae
we have a half-rack of servers doing various things; one DB server, hot-spare DB server, three machines running the website/webservice code, two machines running search servers, one machine building search indexes, frontend/gateway machines, and a variety of smaller things (e.g. our Xen host, we have VMs for the wiki, forums, and some other things)
we've moved stimpy, dexter, and tails out of the rack and have a couple of new ones, at least one of which isn't just shut down as an ostensible future spare
heh, and hobbes is, I believe, currently sitting in our colo's fridge until we have a chance to get over there and open it up to replace some failing disks
ldmosquera
the traffic graph is brutal
ianmcorvidae
I think that graph hasn't been adjusted for ratelimited traffic, not sure
ldmosquera
what happened in mid-2011? Maybe a new client software release?
ianmcorvidae
headphones happened
ldmosquera
figures :P that's exactly how I got here
ianmcorvidae
which is a piece of software that our API is spectacularly bad for
so we ratelimit it really severely, which presumably is why you're setting up your own server :)
ldmosquera
I recently discovered headphones, then beets through it, then I decided I needed my own MB mirror
are you sure it was me? I haven't registered this nick, maybe someone else named like this (incredibly unlikely)
derwin
guess it must have been someone with a similar path to musicbrainz
ianmcorvidae
it's a pretty common one lately
especially for people setting up servers
ldmosquera
I absolutely love MB; I built some scripts to "curate" my collections a few years ago, but it was a heap of manual work
the scripts inferred stuff and made suggestions, but I had to review everything
now I found beets and it manages to do 90% of it without input
derwin
yeah, #beets exists btw, and is active, in case you need help :)
ldmosquera
not so far; it's gloriously well made and I had no suprises
ianmcorvidae: how do you mean MB's API is bad for Headphones?
ianmcorvidae
headphones tends to have to make a lot of requests in order to get the information it wants
ldmosquera
one per track or something like that?
ianmcorvidae
we don't really have much in the way of tools for synchronizing changes, as it were -- most of our API requires you to specify one entity at a time, and polling is really the only way to watch for changes to the data
ldmosquera
oh I see
ianmcorvidae
headphones has done some decent work improving that -- for example by using complicated hacks with things like search queries to get around the one-entity limits
but it's just still really not good for that
the MB API grew up around taggers, and that means that sometimes it's not good for things that don't match that pattern of usage
(with a tagger, it makes a lot of sense: you request one release at a time, and updating to account for changes is largely manual, not automated)
theoretically headphones could even do something semi-crazy like use replication packets, but that wouldn't help with the bits of headphones that are passing to beets and thus require a copy of the MB API
so the usual way of doing things seems to have become "set up a mirror"
it's at least gotten us to be better about releasing updated VMs :)
ldmosquera
nice job downscaling everything into a single VM!
so basically Headphones operates on the entire collection instead of file by file like a tagger, and so ends up doing many requests for each file, right?
and thus would benefit from some kind of batch-mode API
ianmcorvidae
well, a batch-mode API would mean that it could make fewer requests as a matter of polling
what would really help is if we had an effective way to push out changes
i.e., so headphones can watch something and then only make requests for things that have actually changed, rather than polling to see if there are changes
ldmosquera
got it
ianmcorvidae
we have a partially-done experiment in that, but it has a lot of weaknesses and we're perpetually short on resources to work on things, so
ldmosquera
maybe if you could specify "releases newer than date XXX"
ianmcorvidae
what would be fantastic for headphones is if it could just make a request every so often saying "hey, I care about these artist MBIDs, which ones have new releases/release groups?", get back a list of MBIDs, and then request only those
(where "new" would be defined in terms of some date, like you say)
our API also allows a lot of different representations/granularities to the data, though
which makes it hard; such a changed-entities thing would either have to assume that everyone only cares about one particular one of those resolutions/representations, or it needs a way to specify exactly what things a given client cares about (and then it needs to keep track of more data so it can accurately respond to those requests)
and in some cases combining two inc parameters is not just a matter of merging the two, since sometimes one inc parameter will also affect the data returned by another
(especially those listed in "inc= arguments which affect subqueries", but)
(e.g. for a release, inc=artist-credits will include the release artist credit, inc=recordings will include the tracks on the release, but inc=artist-credits+recordings will include the release artist credit, the tracks, and all of the tracks' artist credits
)
we don't have very good internal caching/tracking of changes to data returned by the WS, too -- for HTTP caching stuff we can basically never avoid doing all the work, database-wise, before knowing if the response has changed
which, again, partly-finished experiments exist, but :)
ldmosquera
so the problem would be to make it generic so it could satisfy any client without assuming things like what Headphones needs
ianmcorvidae
yeah
also getting headphones to use it, which can sometimes be a struggle, but if it were well-made enough I guess we'd hope the benefits were self-evident :)
ldmosquera
if Headphones is overwhelmingly more active than other clients, then maybe it'd pay to make just this API endpoint for it
then other clients would probably catch on
I see :)
I also use muspy, which I believe uses MusicBrainz too
how does it fare with the API?
ianmcorvidae
muspy does it a bit better, because it essentially functions as an aggregator
ldmosquera
right, centralized
ianmcorvidae
if 3000 people all follow the same artist on muspy it still only has to make one request to us per day
yeah
muspy is something that it wouldn't be unreasonable for us to copy, in a rough sense, for the sort of changed-data API/feed I was talking about
ldmosquera
gotta run for few hours, but I'll be back
ianmcorvidae
cool, nice talking to you
hopefully you get your issues sorted
ldmosquera
hopefully I can make myself useful :) I'm a developer and sysadmin