ianmcorvidae: you wouldnt happen to be near a computer, would you?
kepstin joined the channel
ruaok: I am now, heh
oh, geez, hm
ruaok: clearly you should have tried to get bcnftw.cat :)
harder to pull off though, I suppose
there should be a non-profit that can purchase cctld domains for you by proxy by following whatever weird rules are required
.cat is quite restricted as I understand it
iirc, you either have to show that it will have content relating to the catalan language, or you have to know a guy
hence why nyan.cat has had a catalan language option
yeah, have content in catalan published online already, access to a special code, "develop activities (in any language) to promote the Catalan culture and language" or are endorsed by 3 people who already have .cat domain names
ianmcorvidae wonders how crypto.cat got away with it, probably the same way
heh, yeah, catalan is the second language, right under english :P
Freso joined the channel
so as long as ruoak finds someone to translate his blog into catalan, he's probably good :)
i mean, the content's definitely relevant :)
(as a side note, 'bcn' always makes me thing 'bacon'. I guess I'm too used to unix command names dropping vowels)
it's quite embarrassing that we have libdiscid fixes committed in 2009 and never released :/
I don't know that anyone in particular has been keeping tabs on that project, I guess that suggests nobody was :)
are you here to fix our replication?
maybe if i catch up enough to understand the problem better
problem isn't really well-understood generally, I think right now we're just going for getting replication back on track
(though potentially still paused -- just have correct packets up to the current replication sequence on production)
but until we know what's actually causing the problem we'll have the potential for getting in trouble again :/
i see more scare noise about locks though
1500 locks is nothing to be alarmed or happy about - it's just a number
do we know what type of locks are held, and where?
which is what I want to test for next time statistics run
(which is when this happened as well -- I think it may be related)
(people have been getting 502s trying to edit... anything, when stats are running)
yea, it does sound related
are our 5xx graphs shining any light on correlation?
(which makes no sense, it has almost no locks other than a bunch of access share on various tables and an exclusive lock on the stats table (but one that allows access share))
for the statistics problem, it's definitely correlated -- we moved stats an hour later to test exactly this and the problem moved with it
I don't know if this is related or how, but it did happen at the right time
basically I just want more information about the problem I know the most about that looks like it might be related :)
of course, that can't happen until twelve hours from now
unless you collect stats again and throw away current stats
(well, back them up, run, and then restore)
yeah, we could dump and then delete today's stats, run it, then delete and reimport
my plan while running it was to trigger what would be a 502 -- i.e. try to submit and edit -- and just dump all of pg_locks to a file while it's timing out
and then it's "just" a matter of looking at everything that's waiting for a lock to figure out *why* it's waiting when statistics shouldn't need such a thing
we could also change the time out killer thingy to log the query that was executing at timeout
but that should be in the serverlog
I may not know where that log is
i need to shut this 'unexpected eof' thing up
in fact, that sorta implies that queries aren't getting aborted and are running for long periods of time
rob was theorizing that something was causing locks -- by which he may have meant transactions holding locks -- to remain open
did the timeout make it through the DBDefs changes
we still don't know if it's a locking problem, really
iirc, postgresql is setup to log if stuff takes ages to acquire a lock
and i'm not seeing those messages
just brainstorming things to check :)
I was wondering if it was that
that's the only explicit lock we're getting (the select from editor for update) that looked probable
they do always crop up at ~1:30
yeah, that's the statistics time
(since it got moved an hour later for diagnosing this)
what I don't understand is why statistics would have a lock on an editor table that conflicts there
i'm not sure it does, i just wonder if the amount of writes it does causes stuff to slow down
but that's a whopping slow down
how is replication broke?
aborted in the middle of doing a packet
is there a log i can see?
probably, it'd be in email
did you read through rob's email?
ianmcorvidae looks for the relevant email, anyway
i'm not finding what i want in emails
I'm not really sure why it aborted, rob seems to have an idea why
the only abort i see is that the next hour rolled around and an existing job was running
yeah, that's all I'm seeing in email
i'm going to guess that rob killed that job with SIGTERM/SIGKILL
however, as rob outlines, it stopped dumping one packet and the next hour (whichever hour that was) included some of the same sequence IDs
hoping he can provide insight on this topic in a few hours
well, i need to get out of bed, have a shower and get some breakfast then
it's a bit of a lazy day :P
if that editor lock is in fact the thing that's failing, btw, we might limit the grabbing of that lock to autoedits, which is the only place where it *should* be required (but we're doing it for every edit)
I think that's not the main issue though
i mostly think that might be a symptomn, but not the problem
ok, gonna et up then, bbiab
ianmcorvidae: Just replied to your comment on CR.
And, uh, sorrry for being a blurb of text. I just get up and couldn't manage figuring out a good place to insert linebreaks. :|
you need to publish the comment
it's not there :P
LordSputnik joined the channel
reosarevok joined the channel
reosarevok joined the channel
sezuan joined the channel
voiceinsideyou joined the channel
voiceinsideyou1 joined the channel
kepstin-laptop joined the channel
ruaok joined the channel
I imagine he's still asleep. he didn't go to bed until late
not even four hours ago :P
I figured that. :) I got the last email from him about 5 hours ago.
I'll wait for him to wake to try and patch things back up.
I will, however, get the search indexes building again.
damn nagios. not sending me emails.
Supposedly he did that?
who did what?
ian apparently got search indexes updating again
he didn't mail me about that.
ah looks like we got one set out and I killed the next run thats been going for about an hour