-
ruaok wonders if a vacuum is effectively in its own transaction
2014-01-06 00607, 2014
-
ianmcorvidae
everything is effectively in its own transaction, but it does take locks
2014-01-06 00622, 2014
-
ianmcorvidae
however, artist edit histories already weren't working so what's there to block? :P
2014-01-06 00636, 2014
-
ruaok
srsly
2014-01-06 00604, 2014
-
nikki
Freso: pinging doesn't work if I'm not at the computer :P
2014-01-06 00633, 2014
-
ruaok
feck.
2014-01-06 00646, 2014
-
ruaok
my session to the vacuum died. :(
2014-01-06 00657, 2014
-
ruaok
did it finish? or did it not?
2014-01-06 00600, 2014
-
ruaok
grrr.
2014-01-06 00637, 2014
-
ianmcorvidae
I don't see it in pg_stat_activity
2014-01-06 00600, 2014
-
ianmcorvidae
so it's not running any more, in any case :P
2014-01-06 00610, 2014
-
ruaok
agreed.
2014-01-06 00618, 2014
-
ruaok
starting screen.
2014-01-06 00627, 2014
-
ruaok
doing it again, just to ensure it finished 100%
2014-01-06 00633, 2014
-
Freso
nikki: I need to ping harder then!
2014-01-06 00646, 2014
-
chirlu`
re: that warning from vacuuming the edit table, it was probably mostly harmless.
2014-01-06 00649, 2014
-
chirlu`
2014-01-06 00652, 2014
-
ruaok
120MB/s!!
2014-01-06 00654, 2014
-
ruaok
holy shit.
2014-01-06 00612, 2014
-
ruaok
blam vacuum done.
2014-01-06 00620, 2014
-
ruaok
I guess it did finish before. ;)
2014-01-06 00643, 2014
-
chirlu`
"These could be pages that were allocated to put new tuples into, but the crash happened before the inserting transaction committed"
2014-01-06 00618, 2014
-
ianmcorvidae
oh
2014-01-06 00626, 2014
-
chirlu`
one page per backend
2014-01-06 00630, 2014
-
ianmcorvidae
the edit_artist explain analyze that I had to kill manually a billion times last night seems to be working
2014-01-06 00637, 2014
-
ianmcorvidae
returned in about 3s
2014-01-06 00645, 2014
-
ianmcorvidae
so that's certainly an improvement if nothing else
2014-01-06 00602, 2014
-
ruaok
yes, explains were not happening earlier too
2014-01-06 00625, 2014
-
ianmcorvidae
were you guys seeing anything from something that wasn't an artist edit history?
2014-01-06 00631, 2014
-
ruaok
OMFG.
2014-01-06 00634, 2014
-
ruaok
the /edits page loaded.
2014-01-06 00635, 2014
-
ianmcorvidae
(incl. people who were seeing 504s, e.g. reo -- anything that wasn't an artist?)
2014-01-06 00654, 2014
-
ruaok gets cheeky
2014-01-06 00658, 2014
-
nikki
an artist or the artist edit history?
2014-01-06 00605, 2014
-
ruaok
both
2014-01-06 00605, 2014
-
nikki
I definitely saw them outside the edit history
2014-01-06 00608, 2014
-
ianmcorvidae
artist edit history, I mean
2014-01-06 00614, 2014
-
nikki
but I don't remember if they were artist pages or not
2014-01-06 00619, 2014
-
ianmcorvidae
hm
2014-01-06 00631, 2014
-
nikki
but it could just be that the load was so high in general, I guess?
2014-01-06 00639, 2014
-
ianmcorvidae
the artist edit history was the particular query that was in the broken transactions
2014-01-06 00640, 2014
-
ruaok
yes.
2014-01-06 00649, 2014
-
ianmcorvidae
so yeah, could just be that i/o was at 100% and load was at 80
2014-01-06 00608, 2014
-
ianmcorvidae
which, let me mention here that I'm amazed how well our site was working with our database server thrashing its disk and at load 80 :P
2014-01-06 00614, 2014
-
ruaok
the u2 artist edit history is looking to time out.
2014-01-06 00616, 2014
-
ruaok
yep.
2014-01-06 00649, 2014
-
chirlu`
May be a different issue.
2014-01-06 00653, 2014
-
ianmcorvidae
if it hasn't actually 502/504'd yet then there's no worries yet
2014-01-06 00603, 2014
-
ruaok
it 504ed
2014-01-06 00621, 2014
-
chirlu`
Editing histories with many/big tracklists time out often.
2014-01-06 00629, 2014
-
ianmcorvidae
okay, but it did also disappear from pg_stat_activity
2014-01-06 00634, 2014
-
ianmcorvidae
rather than staying there forever like it was
2014-01-06 00638, 2014
-
ianmcorvidae
(I'm watching it)
2014-01-06 00645, 2014
-
ruaok tries with a smaller artist
2014-01-06 00654, 2014
-
ruaok
works for portishead
2014-01-06 00658, 2014
-
ianmcorvidae
in short: things look better as far as I can tell
2014-01-06 00602, 2014
-
ruaok
so, we improved/fixed *seomthing*.
2014-01-06 00610, 2014
-
ruaok
question is what exactly was it?
2014-01-06 00617, 2014
-
ruaok
vaccum edit_artist?
2014-01-06 00623, 2014
-
ianmcorvidae
I guess the edit_artist table got fucked up somehow
2014-01-06 00624, 2014
-
ruaok
effective cache_size ?
2014-01-06 00641, 2014
-
ruaok
makes me want to vacuum analyze everything.
2014-01-06 00652, 2014
-
ianmcorvidae
given that it was entirely things touching it that were thrashing disk and failing to finish, at 3am
2014-01-06 00636, 2014
-
ruaok
I'm going to add timestamps to my killer script.
2014-01-06 00642, 2014
-
ruaok
and logging, so we can see if its still doing stuff
2014-01-06 00654, 2014
-
ianmcorvidae
okay, cool
2014-01-06 00627, 2014
-
ianmcorvidae
how often does that run/how old are the processes it kills?
2014-01-06 00615, 2014
-
ruaok
it was running every 30 seconds and it killed the WITH X AS queries that were stuck more than 2 minutes.
2014-01-06 00643, 2014
-
ruaok
astro:musicbrainz/musicrbrainz-server/admin/KillStuckQueries.pl
2014-01-06 00658, 2014
-
ianmcorvidae
k
2014-01-06 00652, 2014
-
ruaok
in the same dir, you can watch kill-stuck-queries.log
2014-01-06 00605, 2014
-
ruaok
the timestamp is not pretty, but it should do.
2014-01-06 00617, 2014
-
ruaok
`date`;
2014-01-06 00618, 2014
-
ruaok
heh.
2014-01-06 00643, 2014
-
ruaok
interesting.
2014-01-06 00652, 2014
-
ruaok
I did a vacuum analyze of edit_release
2014-01-06 00657, 2014
-
ruaok
and it finished in seconds.
2014-01-06 00604, 2014
-
ruaok
whereas the artist one took minutes
2014-01-06 00614, 2014
-
ruaok
release_group took less than 10 seconds.
2014-01-06 00625, 2014
-
ruaok
it seems that some things need vacuuming.
2014-01-06 00635, 2014
-
nikki
the artist history includes pretty much everything touching the artist, so it's bigger in general
2014-01-06 00649, 2014
-
ruaok
in theory a db wide one wouldn't take too long and ensure that we hit everything.
2014-01-06 00652, 2014
-
ruaok
ianmcorvidae: thoughts?
2014-01-06 00624, 2014
-
ianmcorvidae
I'd say we may as well while we're at it, I guess
2014-01-06 00638, 2014
-
ianmcorvidae
the worst thing it does is make a few things time out while it's running
2014-01-06 00602, 2014
-
ruaok
nothing new for today.
2014-01-06 00617, 2014
-
ruaok proceeds
2014-01-06 00612, 2014
-
Freso
Maybe set up a monthly cron job vacuuming the db instead of letting the dust/cruft pile up?
2014-01-06 00621, 2014
-
ianmcorvidae
there's an autovacuum process anyway
2014-01-06 00625, 2014
-
ruaok
you're not supposed to need to do that.
2014-01-06 00625, 2014
-
ianmcorvidae
this is just cleanup from the dist-upgrade
2014-01-06 00615, 2014
-
ruaok
I dont quite understand how dist-upgrade would impact this though.
2014-01-06 00648, 2014
-
Freso
Oh.
2014-01-06 00603, 2014
-
ruaok is reminded as to why those disks are so expensive
2014-01-06 00635, 2014
-
ruaok
down to load 6 and 30-50M/s disk io while the vacuum is running.
2014-01-06 00654, 2014
-
ruaok
that is promising.
2014-01-06 00610, 2014
-
ruaok
and now load 18 and 8MB/s.
2014-01-06 00613, 2014
-
ruaok shrugs
2014-01-06 00628, 2014
-
ianmcorvidae
dunno
2014-01-06 00636, 2014
-
ianmcorvidae
by the by, we did not end up updating the collate extension
2014-01-06 00642, 2014
-
ianmcorvidae
I don't know if that will need a database restart or not
2014-01-06 00623, 2014
-
ruaok
at least that removes that as part of the equation for why things are slow? or does it make a greater possibility that its playing a role?
2014-01-06 00648, 2014
-
ianmcorvidae
well, we'd want to replace the extension and then reindex anything that uses musicbrainz_collate
2014-01-06 00606, 2014
-
ruaok
understood.
2014-01-06 00608, 2014
-
ianmcorvidae
I don't know how a potentially corrupted index affects things
2014-01-06 00626, 2014
-
ianmcorvidae
our issues haven't generally been with things using collate, I guess, but maybe it's behind the overall higher load somehow
2014-01-06 00650, 2014
-
ruaok
shall we get one with it and remove all possible causes of trouble we know of?
2014-01-06 00611, 2014
-
ruaok
if we take downtime, the indexes will rebuild in no time flat.
2014-01-06 00618, 2014
-
ruaok
I bet we could do it in under 5 minutes.
2014-01-06 00631, 2014
-
ianmcorvidae
how's your vacuum doing at present?
2014-01-06 00649, 2014
-
ruaok
we should let that finish first.
2014-01-06 00653, 2014
-
ruaok
but, its still going.
2014-01-06 00655, 2014
-
ianmcorvidae
k
2014-01-06 00602, 2014
-
ruaok
in its ever completely opaque way
2014-01-06 00624, 2014
-
ianmcorvidae
I'll see if I can find something to tell me about extension upgrades
2014-01-06 00634, 2014
-
ruaok
k
2014-01-06 00656, 2014
-
ianmcorvidae
2014-01-06 00659, 2014
-
ruaok
nice
2014-01-06 00620, 2014
-
ruaok
too bad the concurrent reindex feature won't come until pg 9.2
2014-01-06 00636, 2014
-
ruaok
but, we can do the index rebuild in a transaction
2014-01-06 00635, 2014
-
ianmcorvidae
neat
2014-01-06 00648, 2014
-
ruaok
keeps getting easier.
2014-01-06 00603, 2014
-
ianmcorvidae
so that's installed, we should rebuild indexes once the vacuum's done
2014-01-06 00612, 2014
-
ianmcorvidae
rebuild musicbrainz-collate indexes*
2014-01-06 00615, 2014
-
ruaok
it just finished.
2014-01-06 00642, 2014
-
ruaok
do you wanna or should I?
2014-01-06 00647, 2014
-
ianmcorvidae
I'll do it
2014-01-06 00650, 2014
-
ruaok
k
2014-01-06 00651, 2014
-
ianmcorvidae
finding a list of indexes first
2014-01-06 00607, 2014
-
ruaok
see the bottom of CreateIndexes.sql
2014-01-06 00623, 2014
-
ianmcorvidae
yes, I know
2014-01-06 00649, 2014
-
ruaok
the vacuum didn't seem to have done much past what we'd already gotten done.
2014-01-06 00611, 2014
-
ruaok
that is to say that we've not really improved the state of totoro much.
2014-01-06 00626, 2014
-
ruaok
the site is doing better, but only slightly.
2014-01-06 00636, 2014
-
ianmcorvidae
reindexing starting
2014-01-06 00646, 2014
-
ruaok
wow.
2014-01-06 00651, 2014
-
ruaok
that blocked EVERYTHING.
2014-01-06 00617, 2014
-
ruaok
and yep, the site tipped over.
2014-01-06 00624, 2014
-
ruaok
should I 503 it?
2014-01-06 00642, 2014
-
ianmcorvidae
uh, perhaps. it's on index 4 of 9
2014-01-06 00658, 2014
-
ianmcorvidae
5
2014-01-06 00616, 2014
-
ruaok
by the time I get the 503 in place it will be done
2014-01-06 00618, 2014
-
ruaok
done.
2014-01-06 00629, 2014
-
ruaok
what a dramatic effect.
2014-01-06 00630, 2014
-
ianmcorvidae
bah, it rolled back
2014-01-06 00640, 2014
-
ruaok
ugh
2014-01-06 00642, 2014
-
ianmcorvidae
because it got deadlocked trying to do the track index
2014-01-06 00648, 2014
-
ruaok
lets 503
2014-01-06 00652, 2014
-
ianmcorvidae
I'm doing each one in its own transaction, I think, but yes
2014-01-06 00600, 2014
-
ianmcorvidae
tell me when ready