#musicbrainz-devel

/

      • ruaok wonders if a vacuum is effectively in its own transaction
      • 2014-01-06 00607, 2014

      • ianmcorvidae
        everything is effectively in its own transaction, but it does take locks
      • 2014-01-06 00622, 2014

      • ianmcorvidae
        however, artist edit histories already weren't working so what's there to block? :P
      • 2014-01-06 00636, 2014

      • ruaok
        srsly
      • 2014-01-06 00604, 2014

      • nikki
        Freso: pinging doesn't work if I'm not at the computer :P
      • 2014-01-06 00633, 2014

      • ruaok
        feck.
      • 2014-01-06 00646, 2014

      • ruaok
        my session to the vacuum died. :(
      • 2014-01-06 00657, 2014

      • ruaok
        did it finish? or did it not?
      • 2014-01-06 00600, 2014

      • ruaok
        grrr.
      • 2014-01-06 00637, 2014

      • ianmcorvidae
        I don't see it in pg_stat_activity
      • 2014-01-06 00600, 2014

      • ianmcorvidae
        so it's not running any more, in any case :P
      • 2014-01-06 00610, 2014

      • ruaok
        agreed.
      • 2014-01-06 00618, 2014

      • ruaok
        starting screen.
      • 2014-01-06 00627, 2014

      • ruaok
        doing it again, just to ensure it finished 100%
      • 2014-01-06 00633, 2014

      • Freso
        nikki: I need to ping harder then!
      • 2014-01-06 00646, 2014

      • chirlu`
        re: that warning from vacuuming the edit table, it was probably mostly harmless.
      • 2014-01-06 00649, 2014

      • chirlu`
      • 2014-01-06 00652, 2014

      • ruaok
        120MB/s!!
      • 2014-01-06 00654, 2014

      • ruaok
        holy shit.
      • 2014-01-06 00612, 2014

      • ruaok
        blam vacuum done.
      • 2014-01-06 00620, 2014

      • ruaok
        I guess it did finish before. ;)
      • 2014-01-06 00643, 2014

      • chirlu`
        "These could be pages that were allocated to put new tuples into, but the crash happened before the inserting transaction committed"
      • 2014-01-06 00618, 2014

      • ianmcorvidae
        oh
      • 2014-01-06 00626, 2014

      • chirlu`
        one page per backend
      • 2014-01-06 00630, 2014

      • ianmcorvidae
        the edit_artist explain analyze that I had to kill manually a billion times last night seems to be working
      • 2014-01-06 00637, 2014

      • ianmcorvidae
        returned in about 3s
      • 2014-01-06 00645, 2014

      • ianmcorvidae
        so that's certainly an improvement if nothing else
      • 2014-01-06 00602, 2014

      • ruaok
        yes, explains were not happening earlier too
      • 2014-01-06 00625, 2014

      • ianmcorvidae
        were you guys seeing anything from something that wasn't an artist edit history?
      • 2014-01-06 00631, 2014

      • ruaok
        OMFG.
      • 2014-01-06 00634, 2014

      • ruaok
        the /edits page loaded.
      • 2014-01-06 00635, 2014

      • ianmcorvidae
        (incl. people who were seeing 504s, e.g. reo -- anything that wasn't an artist?)
      • 2014-01-06 00654, 2014

      • ruaok gets cheeky
      • 2014-01-06 00658, 2014

      • nikki
        an artist or the artist edit history?
      • 2014-01-06 00605, 2014

      • ruaok
        both
      • 2014-01-06 00605, 2014

      • nikki
        I definitely saw them outside the edit history
      • 2014-01-06 00608, 2014

      • ianmcorvidae
        artist edit history, I mean
      • 2014-01-06 00614, 2014

      • nikki
        but I don't remember if they were artist pages or not
      • 2014-01-06 00619, 2014

      • ianmcorvidae
        hm
      • 2014-01-06 00631, 2014

      • nikki
        but it could just be that the load was so high in general, I guess?
      • 2014-01-06 00639, 2014

      • ianmcorvidae
        the artist edit history was the particular query that was in the broken transactions
      • 2014-01-06 00640, 2014

      • ruaok
        yes.
      • 2014-01-06 00649, 2014

      • ianmcorvidae
        so yeah, could just be that i/o was at 100% and load was at 80
      • 2014-01-06 00608, 2014

      • ianmcorvidae
        which, let me mention here that I'm amazed how well our site was working with our database server thrashing its disk and at load 80 :P
      • 2014-01-06 00614, 2014

      • ruaok
        the u2 artist edit history is looking to time out.
      • 2014-01-06 00616, 2014

      • ruaok
        yep.
      • 2014-01-06 00649, 2014

      • chirlu`
        May be a different issue.
      • 2014-01-06 00653, 2014

      • ianmcorvidae
        if it hasn't actually 502/504'd yet then there's no worries yet
      • 2014-01-06 00603, 2014

      • ruaok
        it 504ed
      • 2014-01-06 00621, 2014

      • chirlu`
        Editing histories with many/big tracklists time out often.
      • 2014-01-06 00629, 2014

      • ianmcorvidae
        okay, but it did also disappear from pg_stat_activity
      • 2014-01-06 00634, 2014

      • ianmcorvidae
        rather than staying there forever like it was
      • 2014-01-06 00638, 2014

      • ianmcorvidae
        (I'm watching it)
      • 2014-01-06 00645, 2014

      • ruaok tries with a smaller artist
      • 2014-01-06 00654, 2014

      • ruaok
        works for portishead
      • 2014-01-06 00658, 2014

      • ianmcorvidae
        in short: things look better as far as I can tell
      • 2014-01-06 00602, 2014

      • ruaok
        so, we improved/fixed *seomthing*.
      • 2014-01-06 00610, 2014

      • ruaok
        question is what exactly was it?
      • 2014-01-06 00617, 2014

      • ruaok
        vaccum edit_artist?
      • 2014-01-06 00623, 2014

      • ianmcorvidae
        I guess the edit_artist table got fucked up somehow
      • 2014-01-06 00624, 2014

      • ruaok
        effective cache_size ?
      • 2014-01-06 00641, 2014

      • ruaok
        makes me want to vacuum analyze everything.
      • 2014-01-06 00652, 2014

      • ianmcorvidae
        given that it was entirely things touching it that were thrashing disk and failing to finish, at 3am
      • 2014-01-06 00636, 2014

      • ruaok
        I'm going to add timestamps to my killer script.
      • 2014-01-06 00642, 2014

      • ruaok
        and logging, so we can see if its still doing stuff
      • 2014-01-06 00654, 2014

      • ianmcorvidae
        okay, cool
      • 2014-01-06 00627, 2014

      • ianmcorvidae
        how often does that run/how old are the processes it kills?
      • 2014-01-06 00615, 2014

      • ruaok
        it was running every 30 seconds and it killed the WITH X AS queries that were stuck more than 2 minutes.
      • 2014-01-06 00643, 2014

      • ruaok
        astro:musicbrainz/musicrbrainz-server/admin/KillStuckQueries.pl
      • 2014-01-06 00658, 2014

      • ianmcorvidae
        k
      • 2014-01-06 00652, 2014

      • ruaok
        in the same dir, you can watch kill-stuck-queries.log
      • 2014-01-06 00605, 2014

      • ruaok
        the timestamp is not pretty, but it should do.
      • 2014-01-06 00617, 2014

      • ruaok
        `date`;
      • 2014-01-06 00618, 2014

      • ruaok
        heh.
      • 2014-01-06 00643, 2014

      • ruaok
        interesting.
      • 2014-01-06 00652, 2014

      • ruaok
        I did a vacuum analyze of edit_release
      • 2014-01-06 00657, 2014

      • ruaok
        and it finished in seconds.
      • 2014-01-06 00604, 2014

      • ruaok
        whereas the artist one took minutes
      • 2014-01-06 00614, 2014

      • ruaok
        release_group took less than 10 seconds.
      • 2014-01-06 00625, 2014

      • ruaok
        it seems that some things need vacuuming.
      • 2014-01-06 00635, 2014

      • nikki
        the artist history includes pretty much everything touching the artist, so it's bigger in general
      • 2014-01-06 00649, 2014

      • ruaok
        in theory a db wide one wouldn't take too long and ensure that we hit everything.
      • 2014-01-06 00652, 2014

      • ruaok
        ianmcorvidae: thoughts?
      • 2014-01-06 00624, 2014

      • ianmcorvidae
        I'd say we may as well while we're at it, I guess
      • 2014-01-06 00638, 2014

      • ianmcorvidae
        the worst thing it does is make a few things time out while it's running
      • 2014-01-06 00602, 2014

      • ruaok
        nothing new for today.
      • 2014-01-06 00617, 2014

      • ruaok proceeds
      • 2014-01-06 00612, 2014

      • Freso
        Maybe set up a monthly cron job vacuuming the db instead of letting the dust/cruft pile up?
      • 2014-01-06 00621, 2014

      • ianmcorvidae
        there's an autovacuum process anyway
      • 2014-01-06 00625, 2014

      • ruaok
        you're not supposed to need to do that.
      • 2014-01-06 00625, 2014

      • ianmcorvidae
        this is just cleanup from the dist-upgrade
      • 2014-01-06 00615, 2014

      • ruaok
        I dont quite understand how dist-upgrade would impact this though.
      • 2014-01-06 00648, 2014

      • Freso
        Oh.
      • 2014-01-06 00603, 2014

      • ruaok is reminded as to why those disks are so expensive
      • 2014-01-06 00635, 2014

      • ruaok
        down to load 6 and 30-50M/s disk io while the vacuum is running.
      • 2014-01-06 00654, 2014

      • ruaok
        that is promising.
      • 2014-01-06 00610, 2014

      • ruaok
        and now load 18 and 8MB/s.
      • 2014-01-06 00613, 2014

      • ruaok shrugs
      • 2014-01-06 00628, 2014

      • ianmcorvidae
        dunno
      • 2014-01-06 00636, 2014

      • ianmcorvidae
        by the by, we did not end up updating the collate extension
      • 2014-01-06 00642, 2014

      • ianmcorvidae
        I don't know if that will need a database restart or not
      • 2014-01-06 00623, 2014

      • ruaok
        at least that removes that as part of the equation for why things are slow? or does it make a greater possibility that its playing a role?
      • 2014-01-06 00648, 2014

      • ianmcorvidae
        well, we'd want to replace the extension and then reindex anything that uses musicbrainz_collate
      • 2014-01-06 00606, 2014

      • ruaok
        understood.
      • 2014-01-06 00608, 2014

      • ianmcorvidae
        I don't know how a potentially corrupted index affects things
      • 2014-01-06 00626, 2014

      • ianmcorvidae
        our issues haven't generally been with things using collate, I guess, but maybe it's behind the overall higher load somehow
      • 2014-01-06 00650, 2014

      • ruaok
        shall we get one with it and remove all possible causes of trouble we know of?
      • 2014-01-06 00611, 2014

      • ruaok
        if we take downtime, the indexes will rebuild in no time flat.
      • 2014-01-06 00618, 2014

      • ruaok
        I bet we could do it in under 5 minutes.
      • 2014-01-06 00631, 2014

      • ianmcorvidae
        how's your vacuum doing at present?
      • 2014-01-06 00649, 2014

      • ruaok
        we should let that finish first.
      • 2014-01-06 00653, 2014

      • ruaok
        but, its still going.
      • 2014-01-06 00655, 2014

      • ianmcorvidae
        k
      • 2014-01-06 00602, 2014

      • ruaok
        in its ever completely opaque way
      • 2014-01-06 00624, 2014

      • ianmcorvidae
        I'll see if I can find something to tell me about extension upgrades
      • 2014-01-06 00634, 2014

      • ruaok
        k
      • 2014-01-06 00656, 2014

      • ianmcorvidae
        I think we don't need a reboot of postgresql, just update the package and then run what's in https://github.com/metabrainz/postgresql-musicbra…
      • 2014-01-06 00659, 2014

      • ruaok
        nice
      • 2014-01-06 00620, 2014

      • ruaok
        too bad the concurrent reindex feature won't come until pg 9.2
      • 2014-01-06 00636, 2014

      • ruaok
        but, we can do the index rebuild in a transaction
      • 2014-01-06 00635, 2014

      • ianmcorvidae
        neat
      • 2014-01-06 00648, 2014

      • ruaok
        keeps getting easier.
      • 2014-01-06 00603, 2014

      • ianmcorvidae
        so that's installed, we should rebuild indexes once the vacuum's done
      • 2014-01-06 00612, 2014

      • ianmcorvidae
        rebuild musicbrainz-collate indexes*
      • 2014-01-06 00615, 2014

      • ruaok
        it just finished.
      • 2014-01-06 00642, 2014

      • ruaok
        do you wanna or should I?
      • 2014-01-06 00647, 2014

      • ianmcorvidae
        I'll do it
      • 2014-01-06 00650, 2014

      • ruaok
        k
      • 2014-01-06 00651, 2014

      • ianmcorvidae
        finding a list of indexes first
      • 2014-01-06 00607, 2014

      • ruaok
        see the bottom of CreateIndexes.sql
      • 2014-01-06 00623, 2014

      • ianmcorvidae
        yes, I know
      • 2014-01-06 00649, 2014

      • ruaok
        the vacuum didn't seem to have done much past what we'd already gotten done.
      • 2014-01-06 00611, 2014

      • ruaok
        that is to say that we've not really improved the state of totoro much.
      • 2014-01-06 00626, 2014

      • ruaok
        the site is doing better, but only slightly.
      • 2014-01-06 00636, 2014

      • ianmcorvidae
        reindexing starting
      • 2014-01-06 00646, 2014

      • ruaok
        wow.
      • 2014-01-06 00651, 2014

      • ruaok
        that blocked EVERYTHING.
      • 2014-01-06 00617, 2014

      • ruaok
        and yep, the site tipped over.
      • 2014-01-06 00624, 2014

      • ruaok
        should I 503 it?
      • 2014-01-06 00642, 2014

      • ianmcorvidae
        uh, perhaps. it's on index 4 of 9
      • 2014-01-06 00658, 2014

      • ianmcorvidae
        5
      • 2014-01-06 00616, 2014

      • ruaok
        by the time I get the 503 in place it will be done
      • 2014-01-06 00618, 2014

      • ruaok
        done.
      • 2014-01-06 00629, 2014

      • ruaok
        what a dramatic effect.
      • 2014-01-06 00630, 2014

      • ianmcorvidae
        bah, it rolled back
      • 2014-01-06 00640, 2014

      • ruaok
        ugh
      • 2014-01-06 00642, 2014

      • ianmcorvidae
        because it got deadlocked trying to do the track index
      • 2014-01-06 00648, 2014

      • ruaok
        lets 503
      • 2014-01-06 00652, 2014

      • ianmcorvidae
        I'm doing each one in its own transaction, I think, but yes
      • 2014-01-06 00600, 2014

      • ianmcorvidae
        tell me when ready