9:30 AM
Hadora
has been
2014-01-06 00648, 2014
9:30 AM
Hadora
in the past
2014-01-06 00654, 2014
9:30 AM
ianmcorvidae
since it started
2014-01-06 00656, 2014
9:30 AM
ianmcorvidae
it's unplanned downtime
2014-01-06 00605, 2014
9:33 AM
Hadora
I know, but I thought someone could knew when the db was switch to read-only, w/e
2014-01-06 00625, 2014
9:34 AM
uk_
at least 40 minutes
2014-01-06 00651, 2014
9:38 AM
uk__ joined the channel
2014-01-06 00604, 2014
9:44 AM
Hadora
ah ok thanks i thought it was for the whole week-end
2014-01-06 00629, 2014
9:44 AM
Hadora
thanks
2014-01-06 00613, 2014
10:06 AM
uk__
So what was the problem?
2014-01-06 00605, 2014
10:27 AM
travis-ci joined the channel
2014-01-06 00606, 2014
10:27 AM
travis-ci has left the channel
2014-01-06 00622, 2014
10:33 AM
Jozo
How about adding banner 'cos 504?
2014-01-06 00609, 2014
11:03 AM
reosarevok joined the channel
2014-01-06 00633, 2014
11:10 AM
reosarevok
:(
2014-01-06 00642, 2014
11:10 AM
reosarevok
I see the whole site is still fucked up
2014-01-06 00640, 2014
11:11 AM
kuno
wb reo, I think ian was/is looking into it.
2014-01-06 00613, 2014
11:15 AM
ruaok joined the channel
2014-01-06 00648, 2014
11:50 AM
ruaok
ocharles: ping
2014-01-06 00645, 2014
12:07 PM
uk__ has left the channel
2014-01-06 00645, 2014
12:59 PM
reosarevok joined the channel
2014-01-06 00650, 2014
13:17 PM
reosarevok joined the channel
2014-01-06 00625, 2014
13:25 PM
ruaok starts writing a script to replace himself
2014-01-06 00638, 2014
13:25 PM
ruaok
can't be more than a few lines, really. :)
2014-01-06 00658, 2014
13:25 PM
Mineo
since you're able to write that script, would that script be able to write a script that replaces itself again?:)
2014-01-06 00658, 2014
13:28 PM
ruaok
I think you're giving the original script writer way too much credit. :)
2014-01-06 00653, 2014
13:37 PM
adhawkins
Site still having issues?
2014-01-06 00659, 2014
13:37 PM
adhawkins
504 Gateway time-out...
2014-01-06 00605, 2014
13:38 PM
ruaok
yep.
2014-01-06 00618, 2014
13:38 PM
adhawkins
Okey dokey, I'll leave you to it then. No biggy :)
2014-01-06 00623, 2014
13:38 PM
adhawkins
Happy New Year all by the way :D
2014-01-06 00600, 2014
13:39 PM
ruaok
happy new year. :)
2014-01-06 00633, 2014
13:39 PM
adhawkins waves
2014-01-06 00610, 2014
13:43 PM
Freso joined the channel
2014-01-06 00647, 2014
13:44 PM
demonimin joined the channel
2014-01-06 00608, 2014
13:51 PM
Nyanko-sensei joined the channel
2014-01-06 00654, 2014
13:52 PM
ocharles
ruaok: pong
2014-01-06 00611, 2014
13:53 PM
ruaok
heya. got a few minutes to help assess an unhappy totoro?
2014-01-06 00616, 2014
13:53 PM
ruaok
both Ian and I are stuck.
2014-01-06 00629, 2014
13:53 PM
ocharles
Sure, let me just finish what I'm working on
2014-01-06 00629, 2014
13:53 PM
ruaok
for an intro, see ian's last email.
2014-01-06 00633, 2014
13:53 PM
ruaok
thx
2014-01-06 00623, 2014
13:57 PM
ocharles
2014-01-06 00644, 2014
13:59 PM
ruaok
yep.
2014-01-06 00634, 2014
14:00 PM
ruaok
from what I can tell, everything is really happy.
2014-01-06 00640, 2014
14:00 PM
ruaok
except for the disk io.
2014-01-06 00652, 2014
14:00 PM
ruaok
indexes are happy.
2014-01-06 00615, 2014
14:01 PM
ruaok
but occasionally we get some stuck queries. and always the same query.
2014-01-06 00633, 2014
14:01 PM
ruaok
I wonder if this is a symptom, rather than the cause.
2014-01-06 00626, 2014
14:04 PM
ocharles
I dunno, IO is back down to much saner levels right now
2014-01-06 00637, 2014
14:04 PM
ruaok
I've been killing stuck procs.
2014-01-06 00639, 2014
14:04 PM
ocharles
but a few minutes ago it was at 100% for a single backend, so I would think that query is what does the damage
2014-01-06 00655, 2014
14:04 PM
ocharles
ok, hold off on that for a moment so I can observe them
2014-01-06 00657, 2014
14:04 PM
ruaok
I've stopped the script that kills stuck procs.
2014-01-06 00600, 2014
14:05 PM
ruaok nods
2014-01-06 00602, 2014
14:05 PM
ocharles
ta
2014-01-06 00629, 2014
14:05 PM
ruaok
so, while we wait, lets summarize what I've learned.
2014-01-06 00635, 2014
14:05 PM
ruaok
lots of reads, almost no writes.
2014-01-06 00613, 2014
14:06 PM
ruaok
this query is interesting: select * from pg_statio_all_tables order by heap_blks_read desc;
2014-01-06 00631, 2014
14:06 PM
ruaok
loads and loads of heap blocks read on edit table.
2014-01-06 00659, 2014
14:06 PM
ruaok
wow. load 4. thats the lowest I've seen in ages. :)
2014-01-06 00606, 2014
14:07 PM
ocharles
:)
2014-01-06 00614, 2014
14:07 PM
ocharles
What I find odd is that Ian says there are queries running for 6 hours
2014-01-06 00624, 2014
14:07 PM
ruaok
I have the sneaky suspicion that we're out of shared_buffers.
2014-01-06 00626, 2014
14:07 PM
ocharles
but we set the statement_timeout to 60 seconds on connection
2014-01-06 00631, 2014
14:07 PM
ruaok
yes, those are the ones I've been killing.
2014-01-06 00643, 2014
14:07 PM
ruaok
yes and his email also acks that those do not seem to be working.
2014-01-06 00657, 2014
14:07 PM
ruaok
and that pg_bouncer is as he eloquently puts it "farting on things" :)
2014-01-06 00605, 2014
14:08 PM
ocharles
yes, but this isn't anything to do with pgbouncer
2014-01-06 00612, 2014
14:08 PM
ruaok
agreed.
2014-01-06 00612, 2014
14:08 PM
ocharles
this is an option for postgresql itself
2014-01-06 00622, 2014
14:08 PM
ocharles
at least, that would be my understanding...
2014-01-06 00622, 2014
14:08 PM
ruaok
there are a lot of disparate symptoms
2014-01-06 00646, 2014
14:08 PM
ruaok
my current theory, and a weak one at that, is that we've run out of shared_buffers.
2014-01-06 00653, 2014
14:08 PM
ruaok
and that is driving up the load accessing the edit table.
2014-01-06 00601, 2014
14:09 PM
ruaok
and that other issues are knock on effects.
2014-01-06 00651, 2014
14:09 PM
ruaok loads http://musicbrainz.org/artist/a3cb23fc-acd3-4ce0-8f36-1e5aa6a18432/edits
2014-01-06 00603, 2014
14:10 PM
ruaok
which 504s with good consistency
2014-01-06 00621, 2014
14:10 PM
ruaok
and immediately you can see one stuck query
2014-01-06 00628, 2014
14:10 PM
ruaok
select (extract(epoch from now()) - extract(epoch from xact_start))::integer as secs, procpid from pg_stat_activity where current_query ilike '%with x as%' order by xact_start;
2014-01-06 00638, 2014
14:10 PM
ruaok
proc 987
2014-01-06 00654, 2014
14:10 PM
ruaok
if you click that link, we'll get another stuck proc.
2014-01-06 00607, 2014
14:11 PM
ruaok
and even with one our disk io is neark 10MB/s
2014-01-06 00652, 2014
14:11 PM
ruaok
I got my 504.
2014-01-06 00602, 2014
14:12 PM
ruaok
and the query is still running.
2014-01-06 00624, 2014
14:12 PM
ruaok
done now
2014-01-06 00615, 2014
14:13 PM
ruaok
now that link loads, since the results are in cache.
2014-01-06 00609, 2014
14:14 PM
ruaok does it again with pink floyd, proc 925
2014-01-06 00628, 2014
14:14 PM
ocharles
Somewhat strange that there are <IDLE> in transaction things
2014-01-06 00645, 2014
14:14 PM
ocharles
oh, I think that was just random chance actually, ignore that
2014-01-06 00619, 2014
14:15 PM
ruaok
yeah, its hard to nail any one thing down.
2014-01-06 00625, 2014
14:15 PM
ruaok
nothing really seems amiss.
2014-01-06 00637, 2014
14:15 PM
ruaok
nothing specific, that is
2014-01-06 00607, 2014
14:17 PM
ocharles
Things look ok for the moment. Keep the script off and I'll keep watching things
2014-01-06 00653, 2014
14:19 PM
reosarevok
"ok" seems strange when I can't see artist's edit histories tbh :p
2014-01-06 00620, 2014
14:20 PM
ocharles
Oh, hmm
2014-01-06 00641, 2014
14:20 PM
ruaok
ocharles: I've just been trying to load some artist edit histories for your entertainment. :)
2014-01-06 00609, 2014
14:21 PM
ruaok
its *very* predictable and consistent.
2014-01-06 00651, 2014
14:22 PM
ocharles
4 minutes not hitting a timeout is very weird though, hrm
2014-01-06 00610, 2014
14:24 PM
ocharles
I just manually ran one of those queries and the timeout definitely works
2014-01-06 00614, 2014
14:24 PM
ocharles
Let me just try it through pgbouncer
2014-01-06 00619, 2014
14:24 PM
ruaok
k
2014-01-06 00600, 2014
14:25 PM
ocharles
yea, the timeout works fine there. hrm
2014-01-06 00630, 2014
14:26 PM
ocharles
ruaok: we could run with your hypothesis that shared_buffers needs to go up
2014-01-06 00651, 2014
14:26 PM
ruaok
its worth a try.
2014-01-06 00615, 2014
14:27 PM
ruaok
and w're on 64 bit, built for 64 bit.
2014-01-06 00624, 2014
14:27 PM
ruaok
we should be fine going above 8GB from what I read.
2014-01-06 00640, 2014
14:27 PM
ocharles
most people don't suggest going much above 8Gb though
2014-01-06 00644, 2014
14:27 PM
ruaok
so, I'd be tempted to go to 1/3 ram, from 1/4ram.
2014-01-06 00651, 2014
14:27 PM
ocharles
when you start going higher it becomes detrimental
2014-01-06 00608, 2014
14:28 PM
ocharles
because now you have a bigger buffer to manage
2014-01-06 00617, 2014
14:28 PM
ruaok
I read an article that disputed that.
2014-01-06 00619, 2014
14:28 PM
ruaok
let me find it.
2014-01-06 00647, 2014
14:29 PM
ruaok
2014-01-06 00609, 2014
14:30 PM
ocharles
fair enough
2014-01-06 00633, 2014
14:30 PM
ruaok
so we need to up the kernel.shmmax value.
2014-01-06 00644, 2014
14:30 PM
ruaok
and then restart.
2014-01-06 00654, 2014
14:30 PM
ruaok
but I never know what to set it to a priori.
2014-01-06 00608, 2014
14:31 PM
ruaok
upping the limit, starting, seeing the error, set kernel value
2014-01-06 00614, 2014
14:31 PM
ruaok
that is my usual process.
2014-01-06 00632, 2014
14:31 PM
ocharles
you could just try doubling both numbers :)
2014-01-06 00639, 2014
14:31 PM
ocharles
hrm
2014-01-06 00645, 2014
14:31 PM
ruaok
thats a bit drastic, no?
2014-01-06 00602, 2014
14:32 PM
ruaok
maybe go to 10GB, from 8GB
2014-01-06 00626, 2014
14:32 PM
ocharles
has anyone tried vacuuming?
2014-01-06 00634, 2014
14:32 PM
ruaok
I haven't
2014-01-06 00640, 2014
14:32 PM
ocharles
Maybe we should try vacuuming edit first
2014-01-06 00613, 2014
14:33 PM
ocharles
Index Scan using edit_pkey on edit (cost=0.00..2689861.26 rows=23225855 width=419) (actual time=0.005..405.894 rows=1308565 loops=1)
2014-01-06 00619, 2014
14:33 PM
ocharles
because that feels really high
2014-01-06 00626, 2014
14:33 PM
Freso
I need to vacuum the apartment. But I also need to read up on anatomy for exam/test tomorrow. :|
2014-01-06 00640, 2014
14:33 PM
Freso re-lurks
2014-01-06 00641, 2014
14:33 PM
ruaok
ocharles: go for it.
2014-01-06 00617, 2014
14:34 PM
ruaok
ocharles: should I kill old locks first?
2014-01-06 00619, 2014
14:34 PM
ruaok
otherwise we
2014-01-06 00625, 2014
14:34 PM
ruaok
're doing to DOS ourselves. :)
2014-01-06 00638, 2014
14:34 PM
ocharles
not sure what you mean by dosing yourself
2014-01-06 00638, 2014
14:34 PM
Jozo
How big are postgresql indexes compared to available memory?
2014-01-06 00650, 2014
14:34 PM
ocharles
vacuum analyze will block writes, but reads should carry on, iirc
2014-01-06 00609, 2014
14:35 PM
ruaok
too much io, a lot of it for queries that will fail.
2014-01-06 00615, 2014
14:35 PM
ruaok
so it will take much longer than needed.
2014-01-06 00620, 2014
14:35 PM
ruaok goes to kill old queries
2014-01-06 00624, 2014
14:35 PM
ruaok
have you started the vacuum?
2014-01-06 00654, 2014
14:35 PM
ruaok
Jozo: how do you query that?
2014-01-06 00604, 2014
14:36 PM
ocharles
I've issued the command, but it might be waiting
2014-01-06 00632, 2014
14:36 PM
ruaok
vacuum analyze?
2014-01-06 00634, 2014
14:36 PM
ocharles
yea
2014-01-06 00637, 2014
14:36 PM
ruaok
k.
2014-01-06 00642, 2014
14:36 PM
ruaok
the io went through the roof.
2014-01-06 00651, 2014
14:36 PM
Jozo
ruaok: No idea (just guessing what may be wrong)
2014-01-06 00651, 2014
14:36 PM
ruaok
the load will soon follow