ianmcorvidae: you wouldnt happen to be near a computer, would you?
kepstin joined the channel
ianmcorvidae
ruaok: I am now, heh
oh, geez, hm
ruaok: clearly you should have tried to get bcnftw.cat :)
harder to pull off though, I suppose
ianweller
there should be a non-profit that can purchase cctld domains for you by proxy by following whatever weird rules are required
ianmcorvidae
.cat is quite restricted as I understand it
ianweller
iirc, you either have to show that it will have content relating to the catalan language, or you have to know a guy
ianmcorvidae
ah
ianweller
hence why nyan.cat has had a catalan language option
ianmcorvidae
yeah, have content in catalan published online already, access to a special code, "develop activities (in any language) to promote the Catalan culture and language" or are endorsed by 3 people who already have .cat domain names
hah
ianmcorvidae wonders how crypto.cat got away with it, probably the same way
heh, yeah, catalan is the second language, right under english :P
Freso joined the channel
kepstin
so as long as ruoak finds someone to translate his blog into catalan, he's probably good :)
i mean, the content's definitely relevant :)
(as a side note, 'bcn' always makes me thing 'bacon'. I guess I'm too used to unix command names dropping vowels)
it's quite embarrassing that we have libdiscid fixes committed in 2009 and never released :/
ianmcorvidae
heh
I don't know that anyone in particular has been keeping tabs on that project, I guess that suggests nobody was :)
ocharles
Explosions eh
nikki
are you here to fix our replication?
ocharles
not entirely
maybe if i catch up enough to understand the problem better
ianmcorvidae
problem isn't really well-understood generally, I think right now we're just going for getting replication back on track
(though potentially still paused -- just have correct packets up to the current replication sequence on production)
but until we know what's actually causing the problem we'll have the potential for getting in trouble again :/
ocharles
i see more scare noise about locks though
1500 locks is nothing to be alarmed or happy about - it's just a number
do we know what type of locks are held, and where?
ianmcorvidae
no
which is what I want to test for next time statistics run
(which is when this happened as well -- I think it may be related)
(people have been getting 502s trying to edit... anything, when stats are running)
ocharles
yea, it does sound related
are our 5xx graphs shining any light on correlation?
ianmcorvidae
(which makes no sense, it has almost no locks other than a bunch of access share on various tables and an exclusive lock on the stats table (but one that allows access share))
for the statistics problem, it's definitely correlated -- we moved stats an hour later to test exactly this and the problem moved with it
I don't know if this is related or how, but it did happen at the right time
ocharles
hmm
ianmcorvidae
basically I just want more information about the problem I know the most about that looks like it might be related :)
ocharles
same
ianmcorvidae
of course, that can't happen until twelve hours from now
ocharles
unless you collect stats again and throw away current stats
(well, back them up, run, and then restore)
ianmcorvidae
yeah, we could dump and then delete today's stats, run it, then delete and reimport
ocharles
right
ianmcorvidae
my plan while running it was to trigger what would be a 502 -- i.e. try to submit and edit -- and just dump all of pg_locks to a file while it's timing out
an*
and then it's "just" a matter of looking at everything that's waiting for a lock to figure out *why* it's waiting when statistics shouldn't need such a thing
ocharles
we could also change the time out killer thingy to log the query that was executing at timeout
but that should be in the serverlog
(pg)
ianmcorvidae
hm
I may not know where that log is
ocharles
/var/log/postgres/serverlog
i need to shut this 'unexpected eof' thing up
in fact, that sorta implies that queries aren't getting aborted and are running for long periods of time
ianmcorvidae
hm
yeah
rob was theorizing that something was causing locks -- by which he may have meant transactions holding locks -- to remain open
DBDefs, perhaps?
did the timeout make it through the DBDefs changes
ocharles
we still don't know if it's a locking problem, really
iirc, postgresql is setup to log if stuff takes ages to acquire a lock
and i'm not seeing those messages
ianmcorvidae
just brainstorming things to check :)
ocharles
somewhat interesting
ianmcorvidae
I was wondering if it was that
that's the only explicit lock we're getting (the select from editor for update) that looked probable
ocharles
they do always crop up at ~1:30
ianmcorvidae
yeah, that's the statistics time
(since it got moved an hour later for diagnosing this)
what I don't understand is why statistics would have a lock on an editor table that conflicts there
ocharles
i'm not sure it does, i just wonder if the amount of writes it does causes stuff to slow down
but that's a whopping slow down
ianmcorvidae
yeah
ocharles
how is replication broke?
ianmcorvidae
aborted in the middle of doing a packet
ocharles
is there a log i can see?
ianmcorvidae
probably, it'd be in email
did you read through rob's email?
ianmcorvidae looks for the relevant email, anyway
ocharles
yea
i'm not finding what i want in emails
ianmcorvidae
I'm not really sure why it aborted, rob seems to have an idea why
ocharles
the only abort i see is that the next hour rolled around and an existing job was running
ianmcorvidae
yeah, that's all I'm seeing in email
ocharles
i'm going to guess that rob killed that job with SIGTERM/SIGKILL
ianmcorvidae
however, as rob outlines, it stopped dumping one packet and the next hour (whichever hour that was) included some of the same sequence IDs
possibly
hoping he can provide insight on this topic in a few hours
ocharles
mmmm
well, i need to get out of bed, have a shower and get some breakfast then
it's a bit of a lazy day :P
ianmcorvidae
seems reasonable
if that editor lock is in fact the thing that's failing, btw, we might limit the grabbing of that lock to autoedits, which is the only place where it *should* be required (but we're doing it for every edit)
I think that's not the main issue though
ocharles
i mostly think that might be a symptomn, but not the problem
ianmcorvidae
yeah, agreed
ocharles
ok, gonna et up then, bbiab
Freso
ianmcorvidae: Just replied to your comment on CR.
And, uh, sorrry for being a blurb of text. I just get up and couldn't manage figuring out a good place to insert linebreaks. :|
ianmcorvidae
you need to publish the comment
it's not there :P
Freso
Oh, right.
Silly CR.
Done.
LordSputnik joined the channel
reosarevok joined the channel
reosarevok joined the channel
sezuan joined the channel
voiceinsideyou joined the channel
voiceinsideyou1 joined the channel
kepstin-laptop joined the channel
ruaok joined the channel
ruaok
ianmcorvidae: ping?
nikki
I imagine he's still asleep. he didn't go to bed until late
not even four hours ago :P
ruaok
I figured that. :) I got the last email from him about 5 hours ago.
I'll wait for him to wake to try and patch things back up.
I will, however, get the search indexes building again.
damn nagios. not sending me emails.
reosarevok
Supposedly he did that?
ruaok
who did what?
nikki
ian apparently got search indexes updating again
reosarevok
that
ruaok
oh, whoops.
he didn't mail me about that.
ah looks like we got one set out and I killed the next run thats been going for about an hour