#musicbrainz-devel

/

      • ianmcorvidae
        ianmcorvidae has changed the topic to: Self freezing week https://youtu.be/5T68TvdoSbI | http://musicbrainz.org/#devel | Agenda: Allowing murdos' bot to do more WD link edits (Freso), blog audience/dev blogging (ian)
      • 2013-11-20 32455, 2013

      • JonnyJD joined the channel
      • 2013-11-20 32452, 2013

      • ldmosquera joined the channel
      • 2013-11-20 32408, 2013

      • ldmosquera
        hello all; question about the Virtualbox VM
      • 2013-11-20 32436, 2013

      • ldmosquera
        I've downloaded the latest version from 2013-10-14, and followed the instructions to set it up
      • 2013-11-20 32459, 2013

      • ldmosquera
        I'm using KVM instead of Virtualbox, but everything's good
      • 2013-11-20 32408, 2013

      • derwin
        I think that won't actually work.
      • 2013-11-20 32424, 2013

      • derwin
        because there's been a schema change since then, and upgrading is apparently not feasible?
      • 2013-11-20 32438, 2013

      • ianmcorvidae
        no, that's the post-schema-change VM
      • 2013-11-20 32440, 2013

      • ianmcorvidae
        see the date :P
      • 2013-11-20 32453, 2013

      • derwin
        oh, didn't realize that the schema change was so long ago.
      • 2013-11-20 32455, 2013

      • ldmosquera
        in any case my problem is with the reindex script
      • 2013-11-20 32430, 2013

      • ldmosquera
        it always crashes either because of SEGFAULTs or different Java exceptions in different places
      • 2013-11-20 32441, 2013

      • ldmosquera
        always during tmp_track
      • 2013-11-20 32448, 2013

      • ldmosquera
        anyway came across this?
      • 2013-11-20 32455, 2013

      • derwin
        what exception? OOM?
      • 2013-11-20 32457, 2013

      • ianmcorvidae
        are you running replication while reindexing? I believe it's not intended to work while replication is on
      • 2013-11-20 32425, 2013

      • ldmosquera
        didn't run any replication commands, is it on by default?
      • 2013-11-20 32432, 2013

      • ianmcorvidae
        it shouldn't be, no
      • 2013-11-20 32441, 2013

      • ldmosquera
        example: Exception in thread "main" java.lang.IncompatibleClassChangeError at org.apache.lucene.document.Document.add(Document.java:64)
      • 2013-11-20 32412, 2013

      • ldmosquera
        another was a NullPointerException in another place in the code
      • 2013-11-20 32419, 2013

      • ianmcorvidae
        strange
      • 2013-11-20 32420, 2013

      • ldmosquera
        it would seem random
      • 2013-11-20 32425, 2013

      • ianmcorvidae
        I wonder if there's something with java versions going on?
      • 2013-11-20 32441, 2013

      • ianmcorvidae
        ijabz is the search server dev and ruaok is the one who has had the biggest role in setting up the VM images
      • 2013-11-20 32447, 2013

      • ianmcorvidae
        neither of them seem to be around at present
      • 2013-11-20 32448, 2013

      • ldmosquera
        I guess that would affect anyone with the same VM version
      • 2013-11-20 32412, 2013

      • ldmosquera
        btw I'm using 6GB RAM so memory is not a problem
      • 2013-11-20 32431, 2013

      • ldmosquera
        alright, I'll look out for them
      • 2013-11-20 32413, 2013

      • ldmosquera
        also, another question
      • 2013-11-20 32436, 2013

      • ldmosquera
        I'm seeing practically the same level of performance using a desktop harddisk and an SSD
      • 2013-11-20 32450, 2013

      • ldmosquera
        always inside KVM using a raw LVM partition
      • 2013-11-20 32405, 2013

      • ianmcorvidae
        performance for what exactly? website, webservice, search?
      • 2013-11-20 32411, 2013

      • ldmosquera
        the indexing, sorry
      • 2013-11-20 32425, 2013

      • ianmcorvidae
        hm
      • 2013-11-20 32433, 2013

      • ldmosquera
        I guess Postgres is the bottleneck
      • 2013-11-20 32442, 2013

      • ianmcorvidae
        not sure what the bottlenecks there are, but yeah, that'd be my guess
      • 2013-11-20 32400, 2013

      • ianmcorvidae
        it could be memory, I suppose, with postgres
      • 2013-11-20 32428, 2013

      • ldmosquera
        going from 2GB to 6GB for the VM made a bit of different but not as big as I'd expect
      • 2013-11-20 32435, 2013

      • ldmosquera
        *difference
      • 2013-11-20 32443, 2013

      • ianmcorvidae
        search server indexing builds a variety of temporary tables, and the automatic tuning only accounts for memory, not anything like SSD tuning
      • 2013-11-20 32410, 2013

      • ianmcorvidae
        with an SSD you want it to much less sharply penalize random seeks and disk read/write operations, as you'd imagine
      • 2013-11-20 32411, 2013

      • ldmosquera
        I looked around for Postgres tuning tips for SSD, but couldn't find much
      • 2013-11-20 32424, 2013

      • ianmcorvidae
        I don't remember the exact parameters for that though
      • 2013-11-20 32428, 2013

      • ldmosquera
        I'm using the deadline IO scheduler instead of the default CFQ
      • 2013-11-20 32445, 2013

      • ianmcorvidae
        I suspect this is higher, in the postgres query planner
      • 2013-11-20 32407, 2013

      • ianmcorvidae
        e.g. for an SSD you'd want it to consider materializing a temporary table much more often than you would with a spinning disk
      • 2013-11-20 32417, 2013

      • ianmcorvidae
        looks like it's seq_page_cost and random_page_cost
      • 2013-11-20 32427, 2013

      • ianmcorvidae
      • 2013-11-20 32427, 2013

      • ianmcorvidae
        default is for seq_page_cost to be 1.0 and random_page_cost to be 4.0, with an SSD you might be able to squeeze some performance by kicking random_page_cost down some
      • 2013-11-20 32429, 2013

      • ldmosquera
        I tried one of those, forgot which, but didn't see much change either; I'll read up more though
      • 2013-11-20 32439, 2013

      • ldmosquera
        random_page_cost I believe
      • 2013-11-20 32445, 2013

      • ianmcorvidae
        I don't necessarily know how much benefit you'll get from that though
      • 2013-11-20 32455, 2013

      • ldmosquera
        thanks a lot! I'll do some tests
      • 2013-11-20 32403, 2013

      • ianmcorvidae
        probably if you want to get more performance you'd need to ensure you know what the real bottleneck is and see what query plans it's getting
      • 2013-11-20 32442, 2013

      • ldmosquera
        I'll just try some general tuning first
      • 2013-11-20 32415, 2013

      • ldmosquera
        with a random_page_cost of 1.1 instead of the default 4.0, tmp_track took 144secs instead of 156secs
      • 2013-11-20 32441, 2013

      • ldmosquera
        underwhelmed :P
      • 2013-11-20 32447, 2013

      • ianmcorvidae
        heh
      • 2013-11-20 32422, 2013

      • ianmcorvidae
        yeah, I don't really know -- it's possible the bottleneck is elsewhere, too
      • 2013-11-20 32440, 2013

      • ianmcorvidae
        I haven't really played with any of this stuff running on SSDs, so :)
      • 2013-11-20 32418, 2013

      • derwin
        but cmon, you have that 501c(3) cheese
      • 2013-11-20 32426, 2013

      • derwin
        make it rain SSDs
      • 2013-11-20 32402, 2013

      • ianmcorvidae
        heh
      • 2013-11-20 32407, 2013

      • ldmosquera
        maybe (likely) the bottleneck is KVM / virtio
      • 2013-11-20 32416, 2013

      • ldmosquera
        I'll try with different cache modes
      • 2013-11-20 32447, 2013

      • ianmcorvidae
        I know that SSDs were considered for the new DB server we bought in 2011, but ultimately it was decided against, I think because the world of the internet wasn't quite sure how much benefit SSDs would bring
      • 2013-11-20 32404, 2013

      • ldmosquera
        what hardware is MB running on nowadays?
      • 2013-11-20 32438, 2013

      • ianmcorvidae
        we have a half-rack of servers doing various things; one DB server, hot-spare DB server, three machines running the website/webservice code, two machines running search servers, one machine building search indexes, frontend/gateway machines, and a variety of smaller things (e.g. our Xen host, we have VMs for the wiki, forums, and some other things)
      • 2013-11-20 32457, 2013

      • derwin
        you could read the 2012 blog post..
      • 2013-11-20 32402, 2013

      • ianmcorvidae
      • 2013-11-20 32451, 2013

      • ldmosquera
        nice! thanks
      • 2013-11-20 32415, 2013

      • ianmcorvidae
        we've moved stimpy, dexter, and tails out of the rack and have a couple of new ones, at least one of which isn't just shut down as an ostensible future spare
      • 2013-11-20 32435, 2013

      • ianmcorvidae
        heh, and hobbes is, I believe, currently sitting in our colo's fridge until we have a chance to get over there and open it up to replace some failing disks
      • 2013-11-20 32443, 2013

      • ldmosquera
        the traffic graph is brutal
      • 2013-11-20 32459, 2013

      • ianmcorvidae
        I think that graph hasn't been adjusted for ratelimited traffic, not sure
      • 2013-11-20 32428, 2013

      • ldmosquera
        what happened in mid-2011? Maybe a new client software release?
      • 2013-11-20 32445, 2013

      • ianmcorvidae
        headphones happened
      • 2013-11-20 32459, 2013

      • ldmosquera
        figures :P that's exactly how I got here
      • 2013-11-20 32404, 2013

      • ianmcorvidae
        which is a piece of software that our API is spectacularly bad for
      • 2013-11-20 32417, 2013

      • ianmcorvidae
        so we ratelimit it really severely, which presumably is why you're setting up your own server :)
      • 2013-11-20 32430, 2013

      • ldmosquera
        I recently discovered headphones, then beets through it, then I decided I needed my own MB mirror
      • 2013-11-20 32440, 2013

      • ianmcorvidae
        http://stats.musicbrainz.org/mrtg/drraw/drraw.cgi… -- we refuse 2/3 of requests that come to us, from headphones
      • 2013-11-20 32457, 2013

      • ianmcorvidae
        well, a bit less than that, but we still accept fewer than we let through :/
      • 2013-11-20 32441, 2013

      • derwin
        oh ldmosquera I spoke with you last week!
      • 2013-11-20 32459, 2013

      • ianmcorvidae
        http://stats.musicbrainz.org/mrtg/drraw/drraw.cgi… for pre-2012 traffic from headphones (by and large)
      • 2013-11-20 32459, 2013

      • ldmosquera
        here? First time I hop in here
      • 2013-11-20 32408, 2013

      • derwin
        in #musicbrainz..
      • 2013-11-20 32413, 2013

      • derwin
        or #beets :)
      • 2013-11-20 32449, 2013

      • ldmosquera
        are you sure it was me? I haven't registered this nick, maybe someone else named like this (incredibly unlikely)
      • 2013-11-20 32401, 2013

      • derwin
        guess it must have been someone with a similar path to musicbrainz
      • 2013-11-20 32416, 2013

      • ianmcorvidae
        it's a pretty common one lately
      • 2013-11-20 32424, 2013

      • ianmcorvidae
        especially for people setting up servers
      • 2013-11-20 32441, 2013

      • ldmosquera
        I absolutely love MB; I built some scripts to "curate" my collections a few years ago, but it was a heap of manual work
      • 2013-11-20 32405, 2013

      • ldmosquera
        the scripts inferred stuff and made suggestions, but I had to review everything
      • 2013-11-20 32423, 2013

      • ldmosquera
        now I found beets and it manages to do 90% of it without input
      • 2013-11-20 32444, 2013

      • derwin
        yeah, #beets exists btw, and is active, in case you need help :)
      • 2013-11-20 32415, 2013

      • ldmosquera
        not so far; it's gloriously well made and I had no suprises
      • 2013-11-20 32456, 2013

      • ldmosquera
        ianmcorvidae: how do you mean MB's API is bad for Headphones?
      • 2013-11-20 32415, 2013

      • ianmcorvidae
        headphones tends to have to make a lot of requests in order to get the information it wants
      • 2013-11-20 32458, 2013

      • ldmosquera
        one per track or something like that?
      • 2013-11-20 32407, 2013

      • ianmcorvidae
        we don't really have much in the way of tools for synchronizing changes, as it were -- most of our API requires you to specify one entity at a time, and polling is really the only way to watch for changes to the data
      • 2013-11-20 32445, 2013

      • ldmosquera
        oh I see
      • 2013-11-20 32450, 2013

      • ianmcorvidae
        headphones has done some decent work improving that -- for example by using complicated hacks with things like search queries to get around the one-entity limits
      • 2013-11-20 32459, 2013

      • ianmcorvidae
        but it's just still really not good for that
      • 2013-11-20 32420, 2013

      • ianmcorvidae
        the MB API grew up around taggers, and that means that sometimes it's not good for things that don't match that pattern of usage
      • 2013-11-20 32456, 2013

      • ianmcorvidae
        (with a tagger, it makes a lot of sense: you request one release at a time, and updating to account for changes is largely manual, not automated)
      • 2013-11-20 32458, 2013

      • ianmcorvidae
        theoretically headphones could even do something semi-crazy like use replication packets, but that wouldn't help with the bits of headphones that are passing to beets and thus require a copy of the MB API
      • 2013-11-20 32423, 2013

      • ianmcorvidae
        so the usual way of doing things seems to have become "set up a mirror"
      • 2013-11-20 32432, 2013

      • ianmcorvidae
        it's at least gotten us to be better about releasing updated VMs :)
      • 2013-11-20 32409, 2013

      • ldmosquera
        nice job downscaling everything into a single VM!
      • 2013-11-20 32459, 2013

      • ldmosquera
        so basically Headphones operates on the entire collection instead of file by file like a tagger, and so ends up doing many requests for each file, right?
      • 2013-11-20 32422, 2013

      • ldmosquera
        and thus would benefit from some kind of batch-mode API
      • 2013-11-20 32439, 2013

      • ianmcorvidae
        well, a batch-mode API would mean that it could make fewer requests as a matter of polling
      • 2013-11-20 32455, 2013

      • ianmcorvidae
        what would really help is if we had an effective way to push out changes
      • 2013-11-20 32423, 2013

      • ianmcorvidae
        i.e., so headphones can watch something and then only make requests for things that have actually changed, rather than polling to see if there are changes
      • 2013-11-20 32446, 2013

      • ldmosquera
        got it
      • 2013-11-20 32449, 2013

      • ianmcorvidae
        we have a partially-done experiment in that, but it has a lot of weaknesses and we're perpetually short on resources to work on things, so
      • 2013-11-20 32415, 2013

      • ldmosquera
        maybe if you could specify "releases newer than date XXX"
      • 2013-11-20 32442, 2013

      • ianmcorvidae
        what would be fantastic for headphones is if it could just make a request every so often saying "hey, I care about these artist MBIDs, which ones have new releases/release groups?", get back a list of MBIDs, and then request only those
      • 2013-11-20 32453, 2013

      • ianmcorvidae
        (where "new" would be defined in terms of some date, like you say)
      • 2013-11-20 32410, 2013

      • ianmcorvidae
        our API also allows a lot of different representations/granularities to the data, though
      • 2013-11-20 32409, 2013

      • ianmcorvidae
        which makes it hard; such a changed-entities thing would either have to assume that everyone only cares about one particular one of those resolutions/representations, or it needs a way to specify exactly what things a given client cares about (and then it needs to keep track of more data so it can accurately respond to those requests)
      • 2013-11-20 32451, 2013

      • ldmosquera
        what are the resolutions, for example?
      • 2013-11-20 32432, 2013

      • ianmcorvidae
        so if you look at http://wiki.musicbrainz.org/XML_Web_Service/Versi… and the followingthree sections, those are the various so-called 'inc parameters'
      • 2013-11-20 32437, 2013

      • ianmcorvidae
        which specify which pieces of data to include
      • 2013-11-20 32437, 2013

      • ldmosquera
        got it
      • 2013-11-20 32450, 2013

      • ianmcorvidae
        and in some cases combining two inc parameters is not just a matter of merging the two, since sometimes one inc parameter will also affect the data returned by another
      • 2013-11-20 32402, 2013

      • ianmcorvidae
        (especially those listed in "inc= arguments which affect subqueries", but)
      • 2013-11-20 32402, 2013

      • ianmcorvidae
        (e.g. for a release, inc=artist-credits will include the release artist credit, inc=recordings will include the tracks on the release, but inc=artist-credits+recordings will include the release artist credit, the tracks, and all of the tracks' artist credits
      • 2013-11-20 32406, 2013

      • ianmcorvidae
        )
      • 2013-11-20 32411, 2013

      • ianmcorvidae
        we don't have very good internal caching/tracking of changes to data returned by the WS, too -- for HTTP caching stuff we can basically never avoid doing all the work, database-wise, before knowing if the response has changed
      • 2013-11-20 32427, 2013

      • ianmcorvidae
        which, again, partly-finished experiments exist, but :)
      • 2013-11-20 32442, 2013

      • ldmosquera
        so the problem would be to make it generic so it could satisfy any client without assuming things like what Headphones needs
      • 2013-11-20 32457, 2013

      • ianmcorvidae
        yeah
      • 2013-11-20 32433, 2013

      • ianmcorvidae
        also getting headphones to use it, which can sometimes be a struggle, but if it were well-made enough I guess we'd hope the benefits were self-evident :)
      • 2013-11-20 32452, 2013

      • ldmosquera
        if Headphones is overwhelmingly more active than other clients, then maybe it'd pay to make just this API endpoint for it
      • 2013-11-20 32437, 2013

      • ldmosquera
        then other clients would probably catch on
      • 2013-11-20 32454, 2013

      • ldmosquera
        I see :)
      • 2013-11-20 32412, 2013

      • ldmosquera
        I also use muspy, which I believe uses MusicBrainz too
      • 2013-11-20 32420, 2013

      • ldmosquera
        how does it fare with the API?
      • 2013-11-20 32430, 2013

      • ianmcorvidae
        muspy does it a bit better, because it essentially functions as an aggregator
      • 2013-11-20 32438, 2013

      • ldmosquera
        right, centralized
      • 2013-11-20 32443, 2013

      • ianmcorvidae
        if 3000 people all follow the same artist on muspy it still only has to make one request to us per day
      • 2013-11-20 32446, 2013

      • ianmcorvidae
        yeah
      • 2013-11-20 32427, 2013

      • ianmcorvidae
        muspy is something that it wouldn't be unreasonable for us to copy, in a rough sense, for the sort of changed-data API/feed I was talking about
      • 2013-11-20 32407, 2013

      • ldmosquera
        gotta run for few hours, but I'll be back
      • 2013-11-20 32416, 2013

      • ianmcorvidae
        cool, nice talking to you
      • 2013-11-20 32422, 2013

      • ianmcorvidae
        hopefully you get your issues sorted
      • 2013-11-20 32425, 2013

      • ldmosquera
        hopefully I can make myself useful :) I'm a developer and sysadmin