ianmcorvidae: you wouldnt happen to be near a computer, would you?
2013-01-01 00133, 2013
kepstin joined the channel
2013-01-01 00142, 2013
ianmcorvidae
ruaok: I am now, heh
2013-01-01 00104, 2013
ianmcorvidae
oh, geez, hm
2013-01-01 00121, 2013
ianmcorvidae
ruaok: clearly you should have tried to get bcnftw.cat :)
2013-01-01 00128, 2013
ianmcorvidae
harder to pull off though, I suppose
2013-01-01 00128, 2013
ianweller
there should be a non-profit that can purchase cctld domains for you by proxy by following whatever weird rules are required
2013-01-01 00103, 2013
ianmcorvidae
.cat is quite restricted as I understand it
2013-01-01 00153, 2013
ianweller
iirc, you either have to show that it will have content relating to the catalan language, or you have to know a guy
2013-01-01 00100, 2013
ianmcorvidae
ah
2013-01-01 00152, 2013
ianweller
hence why nyan.cat has had a catalan language option
2013-01-01 00130, 2013
ianmcorvidae
yeah, have content in catalan published online already, access to a special code, "develop activities (in any language) to promote the Catalan culture and language" or are endorsed by 3 people who already have .cat domain names
2013-01-01 00134, 2013
ianmcorvidae
hah
2013-01-01 00155, 2013
ianmcorvidae wonders how crypto.cat got away with it, probably the same way
2013-01-01 00108, 2013
ianmcorvidae
heh, yeah, catalan is the second language, right under english :P
2013-01-01 00149, 2013
Freso joined the channel
2013-01-01 00107, 2013
kepstin
so as long as ruoak finds someone to translate his blog into catalan, he's probably good :)
2013-01-01 00122, 2013
kepstin
i mean, the content's definitely relevant :)
2013-01-01 00153, 2013
kepstin
(as a side note, 'bcn' always makes me thing 'bacon'. I guess I'm too used to unix command names dropping vowels)
it's quite embarrassing that we have libdiscid fixes committed in 2009 and never released :/
2013-01-01 00107, 2013
ianmcorvidae
heh
2013-01-01 00126, 2013
ianmcorvidae
I don't know that anyone in particular has been keeping tabs on that project, I guess that suggests nobody was :)
2013-01-01 00158, 2013
ocharles
Explosions eh
2013-01-01 00140, 2013
nikki
are you here to fix our replication?
2013-01-01 00155, 2013
ocharles
not entirely
2013-01-01 00112, 2013
ocharles
maybe if i catch up enough to understand the problem better
2013-01-01 00136, 2013
ianmcorvidae
problem isn't really well-understood generally, I think right now we're just going for getting replication back on track
2013-01-01 00111, 2013
ianmcorvidae
(though potentially still paused -- just have correct packets up to the current replication sequence on production)
2013-01-01 00127, 2013
ianmcorvidae
but until we know what's actually causing the problem we'll have the potential for getting in trouble again :/
2013-01-01 00147, 2013
ocharles
i see more scare noise about locks though
2013-01-01 00157, 2013
ocharles
1500 locks is nothing to be alarmed or happy about - it's just a number
2013-01-01 00104, 2013
ocharles
do we know what type of locks are held, and where?
2013-01-01 00111, 2013
ianmcorvidae
no
2013-01-01 00119, 2013
ianmcorvidae
which is what I want to test for next time statistics run
2013-01-01 00128, 2013
ianmcorvidae
(which is when this happened as well -- I think it may be related)
2013-01-01 00144, 2013
ianmcorvidae
(people have been getting 502s trying to edit... anything, when stats are running)
2013-01-01 00101, 2013
ocharles
yea, it does sound related
2013-01-01 00116, 2013
ocharles
are our 5xx graphs shining any light on correlation?
2013-01-01 00120, 2013
ianmcorvidae
(which makes no sense, it has almost no locks other than a bunch of access share on various tables and an exclusive lock on the stats table (but one that allows access share))
2013-01-01 00141, 2013
ianmcorvidae
for the statistics problem, it's definitely correlated -- we moved stats an hour later to test exactly this and the problem moved with it
2013-01-01 00113, 2013
ianmcorvidae
I don't know if this is related or how, but it did happen at the right time
2013-01-01 00123, 2013
ocharles
hmm
2013-01-01 00125, 2013
ianmcorvidae
basically I just want more information about the problem I know the most about that looks like it might be related :)
2013-01-01 00101, 2013
ocharles
same
2013-01-01 00119, 2013
ianmcorvidae
of course, that can't happen until twelve hours from now
2013-01-01 00103, 2013
ocharles
unless you collect stats again and throw away current stats
2013-01-01 00112, 2013
ocharles
(well, back them up, run, and then restore)
2013-01-01 00137, 2013
ianmcorvidae
yeah, we could dump and then delete today's stats, run it, then delete and reimport
2013-01-01 00104, 2013
ocharles
right
2013-01-01 00107, 2013
ianmcorvidae
my plan while running it was to trigger what would be a 502 -- i.e. try to submit and edit -- and just dump all of pg_locks to a file while it's timing out
2013-01-01 00113, 2013
ianmcorvidae
an*
2013-01-01 00148, 2013
ianmcorvidae
and then it's "just" a matter of looking at everything that's waiting for a lock to figure out *why* it's waiting when statistics shouldn't need such a thing
2013-01-01 00133, 2013
ocharles
we could also change the time out killer thingy to log the query that was executing at timeout
2013-01-01 00136, 2013
ocharles
but that should be in the serverlog
2013-01-01 00138, 2013
ocharles
(pg)
2013-01-01 00151, 2013
ianmcorvidae
hm
2013-01-01 00154, 2013
ianmcorvidae
I may not know where that log is
2013-01-01 00124, 2013
ocharles
/var/log/postgres/serverlog
2013-01-01 00141, 2013
ocharles
i need to shut this 'unexpected eof' thing up
2013-01-01 00144, 2013
ocharles
in fact, that sorta implies that queries aren't getting aborted and are running for long periods of time
2013-01-01 00144, 2013
ianmcorvidae
hm
2013-01-01 00154, 2013
ianmcorvidae
yeah
2013-01-01 00121, 2013
ianmcorvidae
rob was theorizing that something was causing locks -- by which he may have meant transactions holding locks -- to remain open
2013-01-01 00128, 2013
ianmcorvidae
DBDefs, perhaps?
2013-01-01 00140, 2013
ianmcorvidae
did the timeout make it through the DBDefs changes
2013-01-01 00102, 2013
ocharles
we still don't know if it's a locking problem, really
2013-01-01 00131, 2013
ocharles
iirc, postgresql is setup to log if stuff takes ages to acquire a lock
2013-01-01 00134, 2013
ocharles
and i'm not seeing those messages
2013-01-01 00115, 2013
ianmcorvidae
just brainstorming things to check :)
2013-01-01 00110, 2013
ocharles
2013-01-01 00114, 2013
ocharles
somewhat interesting
2013-01-01 00136, 2013
ianmcorvidae
I was wondering if it was that
2013-01-01 00157, 2013
ianmcorvidae
that's the only explicit lock we're getting (the select from editor for update) that looked probable
2013-01-01 00100, 2013
ocharles
they do always crop up at ~1:30
2013-01-01 00107, 2013
ianmcorvidae
yeah, that's the statistics time
2013-01-01 00117, 2013
ianmcorvidae
(since it got moved an hour later for diagnosing this)
2013-01-01 00131, 2013
ianmcorvidae
what I don't understand is why statistics would have a lock on an editor table that conflicts there
2013-01-01 00156, 2013
ocharles
i'm not sure it does, i just wonder if the amount of writes it does causes stuff to slow down
2013-01-01 00101, 2013
ocharles
but that's a whopping slow down
2013-01-01 00128, 2013
ianmcorvidae
yeah
2013-01-01 00102, 2013
ocharles
how is replication broke?
2013-01-01 00114, 2013
ianmcorvidae
aborted in the middle of doing a packet
2013-01-01 00134, 2013
ocharles
is there a log i can see?
2013-01-01 00149, 2013
ianmcorvidae
probably, it'd be in email
2013-01-01 00155, 2013
ianmcorvidae
did you read through rob's email?
2013-01-01 00115, 2013
ianmcorvidae looks for the relevant email, anyway
2013-01-01 00115, 2013
ocharles
yea
2013-01-01 00122, 2013
ocharles
i'm not finding what i want in emails
2013-01-01 00132, 2013
ianmcorvidae
I'm not really sure why it aborted, rob seems to have an idea why
2013-01-01 00155, 2013
ocharles
the only abort i see is that the next hour rolled around and an existing job was running
2013-01-01 00117, 2013
ianmcorvidae
yeah, that's all I'm seeing in email
2013-01-01 00149, 2013
ocharles
i'm going to guess that rob killed that job with SIGTERM/SIGKILL
2013-01-01 00150, 2013
ianmcorvidae
however, as rob outlines, it stopped dumping one packet and the next hour (whichever hour that was) included some of the same sequence IDs
2013-01-01 00117, 2013
ianmcorvidae
possibly
2013-01-01 00128, 2013
ianmcorvidae
hoping he can provide insight on this topic in a few hours
2013-01-01 00134, 2013
ocharles
mmmm
2013-01-01 00145, 2013
ocharles
well, i need to get out of bed, have a shower and get some breakfast then
2013-01-01 00149, 2013
ocharles
it's a bit of a lazy day :P
2013-01-01 00151, 2013
ianmcorvidae
seems reasonable
2013-01-01 00144, 2013
ianmcorvidae
if that editor lock is in fact the thing that's failing, btw, we might limit the grabbing of that lock to autoedits, which is the only place where it *should* be required (but we're doing it for every edit)
2013-01-01 00104, 2013
ianmcorvidae
I think that's not the main issue though
2013-01-01 00132, 2013
ocharles
i mostly think that might be a symptomn, but not the problem
2013-01-01 00139, 2013
ianmcorvidae
yeah, agreed
2013-01-01 00103, 2013
ocharles
ok, gonna et up then, bbiab
2013-01-01 00152, 2013
Freso
ianmcorvidae: Just replied to your comment on CR.
2013-01-01 00135, 2013
Freso
And, uh, sorrry for being a blurb of text. I just get up and couldn't manage figuring out a good place to insert linebreaks. :|
2013-01-01 00157, 2013
ianmcorvidae
you need to publish the comment
2013-01-01 00100, 2013
ianmcorvidae
it's not there :P
2013-01-01 00119, 2013
Freso
Oh, right.
2013-01-01 00122, 2013
Freso
Silly CR.
2013-01-01 00124, 2013
Freso
Done.
2013-01-01 00115, 2013
LordSputnik joined the channel
2013-01-01 00120, 2013
reosarevok joined the channel
2013-01-01 00121, 2013
reosarevok joined the channel
2013-01-01 00153, 2013
sezuan joined the channel
2013-01-01 00135, 2013
voiceinsideyou joined the channel
2013-01-01 00121, 2013
voiceinsideyou1 joined the channel
2013-01-01 00137, 2013
kepstin-laptop joined the channel
2013-01-01 00158, 2013
ruaok joined the channel
2013-01-01 00141, 2013
ruaok
ianmcorvidae: ping?
2013-01-01 00114, 2013
nikki
I imagine he's still asleep. he didn't go to bed until late
2013-01-01 00135, 2013
nikki
not even four hours ago :P
2013-01-01 00136, 2013
ruaok
I figured that. :) I got the last email from him about 5 hours ago.
2013-01-01 00153, 2013
ruaok
I'll wait for him to wake to try and patch things back up.
2013-01-01 00107, 2013
ruaok
I will, however, get the search indexes building again.
2013-01-01 00113, 2013
ruaok
damn nagios. not sending me emails.
2013-01-01 00119, 2013
reosarevok
Supposedly he did that?
2013-01-01 00154, 2013
ruaok
who did what?
2013-01-01 00103, 2013
nikki
ian apparently got search indexes updating again
2013-01-01 00110, 2013
reosarevok
that
2013-01-01 00129, 2013
ruaok
oh, whoops.
2013-01-01 00133, 2013
ruaok
he didn't mail me about that.
2013-01-01 00113, 2013
ruaok
ah looks like we got one set out and I killed the next run thats been going for about an hour