#musicbrainz-devel

/

0:27 AM
jessew joined the channel

2013-01-01 00145, 2013

2:38 AM
ruaok will just leave this here http://bcnftw.es/

2013-01-01 00153, 2013

2:40 AM
kepstin-laptop

... ruaok*any* excuse to have a party, eh :)

2013-01-01 00103, 2013

2:41 AM
kepstin-laptop

... that lost some words.

2013-01-01 00146, 2013

2:41 AM
ruaok

life is a party, no? :)

2013-01-01 00134, 2013

3:07 AM
Freso joined the channel

2013-01-01 00121, 2013

3:44 AM
ruaok

ianmcorvidae: you wouldnt happen to be near a computer, would you?

2013-01-01 00133, 2013

3:44 AM
kepstin joined the channel

2013-01-01 00142, 2013

4:52 AM
ianmcorvidae

ruaok: I am now, heh

2013-01-01 00104, 2013

4:56 AM
ianmcorvidae

oh, geez, hm

2013-01-01 00121, 2013

5:17 AM
ianmcorvidae

ruaok: clearly you should have tried to get bcnftw.cat :)

2013-01-01 00128, 2013

5:17 AM
ianmcorvidae

harder to pull off though, I suppose

2013-01-01 00128, 2013

5:18 AM
ianweller

there should be a non-profit that can purchase cctld domains for you by proxy by following whatever weird rules are required

2013-01-01 00103, 2013

5:19 AM
ianmcorvidae

.cat is quite restricted as I understand it

2013-01-01 00153, 2013

5:19 AM
ianweller

iirc, you either have to show that it will have content relating to the catalan language, or you have to know a guy

2013-01-01 00100, 2013

5:20 AM
ianmcorvidae

ah

2013-01-01 00152, 2013

5:20 AM
ianweller

hence why nyan.cat has had a catalan language option

2013-01-01 00130, 2013

5:21 AM
ianmcorvidae

yeah, have content in catalan published online already, access to a special code, "develop activities (in any language) to promote the Catalan culture and language" or are endorsed by 3 people who already have .cat domain names

2013-01-01 00134, 2013

5:21 AM
ianmcorvidae

hah

2013-01-01 00155, 2013

5:21 AM
ianmcorvidae wonders how crypto.cat got away with it, probably the same way

2013-01-01 00108, 2013

5:22 AM
ianmcorvidae

heh, yeah, catalan is the second language, right under english :P

2013-01-01 00149, 2013

5:35 AM
Freso joined the channel

2013-01-01 00107, 2013

6:54 AM
kepstin

so as long as ruoak finds someone to translate his blog into catalan, he's probably good :)

2013-01-01 00122, 2013

6:54 AM
kepstin

i mean, the content's definitely relevant :)

2013-01-01 00153, 2013

6:55 AM
kepstin

(as a side note, 'bcn' always makes me thing 'bacon'. I guess I'm too used to unix command names dropping vowels)

2013-01-01 00149, 2013

7:20 AM
night199uk joined the channel

2013-01-01 00144, 2013

7:30 AM
night199uk joined the channel

2013-01-01 00143, 2013

7:40 AM
night199uk joined the channel

2013-01-01 00147, 2013

8:51 AM
Leftmost joined the channel

2013-01-01 00142, 2013

9:22 AM
luks

can somebody please create https://github.com/metabrainz/libdiscid and give me access to it?

2013-01-01 00112, 2013

9:23 AM
jessew joined the channel

2013-01-01 00129, 2013

9:23 AM
ianmcorvidae

luks: just an empty repo?

2013-01-01 00134, 2013

9:23 AM
luks

yes

2013-01-01 00102, 2013

9:28 AM
ianmcorvidae

okay, should be there and you should have access

2013-01-01 00112, 2013

9:29 AM
luks

thanks

2013-01-01 00143, 2013

9:33 AM
luks

it's quite embarrassing that we have libdiscid fixes committed in 2009 and never released :/

2013-01-01 00107, 2013

9:34 AM
ianmcorvidae

heh

2013-01-01 00126, 2013

9:34 AM
ianmcorvidae

I don't know that anyone in particular has been keeping tabs on that project, I guess that suggests nobody was :)

2013-01-01 00158, 2013

12:36 PM
ocharles

Explosions eh

2013-01-01 00140, 2013

12:39 PM
nikki

are you here to fix our replication?

2013-01-01 00155, 2013

12:52 PM
ocharles

not entirely

2013-01-01 00112, 2013

12:53 PM
ocharles

maybe if i catch up enough to understand the problem better

2013-01-01 00136, 2013

12:54 PM
ianmcorvidae

problem isn't really well-understood generally, I think right now we're just going for getting replication back on track

2013-01-01 00111, 2013

12:55 PM
ianmcorvidae

(though potentially still paused -- just have correct packets up to the current replication sequence on production)

2013-01-01 00127, 2013

12:56 PM
ianmcorvidae

but until we know what's actually causing the problem we'll have the potential for getting in trouble again :/

2013-01-01 00147, 2013

13:00 PM
ocharles

i see more scare noise about locks though

2013-01-01 00157, 2013

13:00 PM
ocharles

1500 locks is nothing to be alarmed or happy about - it's just a number

2013-01-01 00104, 2013

13:01 PM
ocharles

do we know what type of locks are held, and where?

2013-01-01 00111, 2013

13:01 PM
ianmcorvidae

no

2013-01-01 00119, 2013

13:01 PM
ianmcorvidae

which is what I want to test for next time statistics run

2013-01-01 00128, 2013

13:01 PM
ianmcorvidae

(which is when this happened as well -- I think it may be related)

2013-01-01 00144, 2013

13:01 PM
ianmcorvidae

(people have been getting 502s trying to edit... anything, when stats are running)

2013-01-01 00101, 2013

13:02 PM
ocharles

yea, it does sound related

2013-01-01 00116, 2013

13:02 PM
ocharles

are our 5xx graphs shining any light on correlation?

2013-01-01 00120, 2013

13:02 PM
ianmcorvidae

(which makes no sense, it has almost no locks other than a bunch of access share on various tables and an exclusive lock on the stats table (but one that allows access share))

2013-01-01 00141, 2013

13:02 PM
ianmcorvidae

for the statistics problem, it's definitely correlated -- we moved stats an hour later to test exactly this and the problem moved with it

2013-01-01 00113, 2013

13:03 PM
ianmcorvidae

I don't know if this is related or how, but it did happen at the right time

2013-01-01 00123, 2013

13:03 PM
ocharles

hmm

2013-01-01 00125, 2013

13:03 PM
ianmcorvidae

basically I just want more information about the problem I know the most about that looks like it might be related :)

2013-01-01 00101, 2013

13:04 PM
ocharles

same

2013-01-01 00119, 2013

13:04 PM
ianmcorvidae

of course, that can't happen until twelve hours from now

2013-01-01 00103, 2013

13:05 PM
ocharles

unless you collect stats again and throw away current stats

2013-01-01 00112, 2013

13:05 PM
ocharles

(well, back them up, run, and then restore)

2013-01-01 00137, 2013

13:05 PM
ianmcorvidae

yeah, we could dump and then delete today's stats, run it, then delete and reimport

2013-01-01 00104, 2013

13:06 PM
ocharles

right

2013-01-01 00107, 2013

13:06 PM
ianmcorvidae

my plan while running it was to trigger what would be a 502 -- i.e. try to submit and edit -- and just dump all of pg_locks to a file while it's timing out

2013-01-01 00113, 2013

13:06 PM
ianmcorvidae

an*

2013-01-01 00148, 2013

13:06 PM
ianmcorvidae

and then it's "just" a matter of looking at everything that's waiting for a lock to figure out *why* it's waiting when statistics shouldn't need such a thing

2013-01-01 00133, 2013

13:07 PM
ocharles

we could also change the time out killer thingy to log the query that was executing at timeout

2013-01-01 00136, 2013

13:07 PM
ocharles

but that should be in the serverlog

2013-01-01 00138, 2013

13:07 PM
ocharles

(pg)

2013-01-01 00151, 2013

13:07 PM
ianmcorvidae

hm

2013-01-01 00154, 2013

13:07 PM
ianmcorvidae

I may not know where that log is

2013-01-01 00124, 2013

13:08 PM
ocharles

/var/log/postgres/serverlog

2013-01-01 00141, 2013

13:08 PM
ocharles

i need to shut this 'unexpected eof' thing up

2013-01-01 00144, 2013

13:09 PM
ocharles

in fact, that sorta implies that queries aren't getting aborted and are running for long periods of time

2013-01-01 00144, 2013

13:09 PM
ianmcorvidae

hm

2013-01-01 00154, 2013

13:09 PM
ianmcorvidae

yeah

2013-01-01 00121, 2013

13:10 PM
ianmcorvidae

rob was theorizing that something was causing locks -- by which he may have meant transactions holding locks -- to remain open

2013-01-01 00128, 2013

13:10 PM
ianmcorvidae

DBDefs, perhaps?

2013-01-01 00140, 2013

13:10 PM
ianmcorvidae

did the timeout make it through the DBDefs changes

2013-01-01 00102, 2013

13:11 PM
ocharles

we still don't know if it's a locking problem, really

2013-01-01 00131, 2013

13:11 PM
ocharles

iirc, postgresql is setup to log if stuff takes ages to acquire a lock

2013-01-01 00134, 2013

13:11 PM
ocharles

and i'm not seeing those messages

2013-01-01 00115, 2013

13:13 PM
ianmcorvidae

just brainstorming things to check :)

2013-01-01 00110, 2013

13:15 PM
ocharles

2013-01-01 00114, 2013

13:15 PM
ocharles

somewhat interesting

2013-01-01 00136, 2013

13:15 PM
ianmcorvidae

I was wondering if it was that

2013-01-01 00157, 2013

13:15 PM
ianmcorvidae

that's the only explicit lock we're getting (the select from editor for update) that looked probable

2013-01-01 00100, 2013

13:17 PM
ocharles

they do always crop up at ~1:30

2013-01-01 00107, 2013

13:17 PM
ianmcorvidae

yeah, that's the statistics time

2013-01-01 00117, 2013

13:17 PM
ianmcorvidae

(since it got moved an hour later for diagnosing this)

2013-01-01 00131, 2013

13:17 PM
ianmcorvidae

what I don't understand is why statistics would have a lock on an editor table that conflicts there

2013-01-01 00156, 2013

13:17 PM
ocharles

i'm not sure it does, i just wonder if the amount of writes it does causes stuff to slow down

2013-01-01 00101, 2013

13:18 PM
ocharles

but that's a whopping slow down

2013-01-01 00128, 2013

13:18 PM
ianmcorvidae

yeah

2013-01-01 00102, 2013

13:19 PM
ocharles

how is replication broke?

2013-01-01 00114, 2013

13:19 PM
ianmcorvidae

aborted in the middle of doing a packet

2013-01-01 00134, 2013

13:19 PM
ocharles

is there a log i can see?

2013-01-01 00149, 2013

13:19 PM
ianmcorvidae

probably, it'd be in email

2013-01-01 00155, 2013

13:19 PM
ianmcorvidae

did you read through rob's email?

2013-01-01 00115, 2013

13:20 PM
ianmcorvidae looks for the relevant email, anyway

2013-01-01 00115, 2013

13:20 PM
ocharles

yea

2013-01-01 00122, 2013

13:20 PM
ocharles

i'm not finding what i want in emails

2013-01-01 00132, 2013

13:20 PM
ianmcorvidae

I'm not really sure why it aborted, rob seems to have an idea why

2013-01-01 00155, 2013

13:20 PM
ocharles

the only abort i see is that the next hour rolled around and an existing job was running

2013-01-01 00117, 2013

13:22 PM
ianmcorvidae

yeah, that's all I'm seeing in email

2013-01-01 00149, 2013

13:22 PM
ocharles

i'm going to guess that rob killed that job with SIGTERM/SIGKILL

2013-01-01 00150, 2013

13:22 PM
ianmcorvidae

however, as rob outlines, it stopped dumping one packet and the next hour (whichever hour that was) included some of the same sequence IDs

2013-01-01 00117, 2013

13:24 PM
ianmcorvidae

possibly

2013-01-01 00128, 2013

13:24 PM
ianmcorvidae

hoping he can provide insight on this topic in a few hours

2013-01-01 00134, 2013

13:24 PM
ocharles

mmmm

2013-01-01 00145, 2013

13:24 PM
ocharles

well, i need to get out of bed, have a shower and get some breakfast then

2013-01-01 00149, 2013

13:24 PM
ocharles

it's a bit of a lazy day :P

2013-01-01 00151, 2013

13:24 PM
ianmcorvidae

seems reasonable

2013-01-01 00144, 2013

13:25 PM
ianmcorvidae

if that editor lock is in fact the thing that's failing, btw, we might limit the grabbing of that lock to autoedits, which is the only place where it *should* be required (but we're doing it for every edit)

2013-01-01 00104, 2013

13:26 PM
ianmcorvidae

I think that's not the main issue though

2013-01-01 00132, 2013

13:26 PM
ocharles

i mostly think that might be a symptomn, but not the problem

2013-01-01 00139, 2013

13:26 PM
ianmcorvidae

yeah, agreed

2013-01-01 00103, 2013

13:27 PM
ocharles

ok, gonna et up then, bbiab

2013-01-01 00152, 2013

13:28 PM
Freso

ianmcorvidae: Just replied to your comment on CR.

2013-01-01 00135, 2013

13:29 PM
Freso

And, uh, sorrry for being a blurb of text. I just get up and couldn't manage figuring out a good place to insert linebreaks. :|

2013-01-01 00157, 2013

13:29 PM
ianmcorvidae

you need to publish the comment

2013-01-01 00100, 2013

13:30 PM
ianmcorvidae

it's not there :P

2013-01-01 00119, 2013

13:31 PM
Freso

Oh, right.

2013-01-01 00122, 2013

13:31 PM
Freso

Silly CR.

2013-01-01 00124, 2013

13:31 PM
Freso

Done.

2013-01-01 00115, 2013

13:53 PM
LordSputnik joined the channel

2013-01-01 00120, 2013

14:00 PM
reosarevok joined the channel

2013-01-01 00121, 2013

14:00 PM
reosarevok joined the channel

2013-01-01 00153, 2013

15:29 PM
sezuan joined the channel

2013-01-01 00135, 2013

16:56 PM
voiceinsideyou joined the channel

2013-01-01 00121, 2013

17:00 PM
voiceinsideyou1 joined the channel

2013-01-01 00137, 2013

18:54 PM
kepstin-laptop joined the channel

2013-01-01 00158, 2013

19:08 PM
ruaok joined the channel

2013-01-01 00141, 2013

19:10 PM
ruaok

ianmcorvidae: ping?

2013-01-01 00114, 2013

19:11 PM
nikki

I imagine he's still asleep. he didn't go to bed until late

2013-01-01 00135, 2013

19:11 PM
nikki

not even four hours ago :P

2013-01-01 00136, 2013

19:11 PM
ruaok

I figured that. :) I got the last email from him about 5 hours ago.

2013-01-01 00153, 2013

19:11 PM
ruaok

I'll wait for him to wake to try and patch things back up.

2013-01-01 00107, 2013

19:12 PM
ruaok

I will, however, get the search indexes building again.

2013-01-01 00113, 2013

19:12 PM
ruaok

damn nagios. not sending me emails.

2013-01-01 00119, 2013

19:13 PM
reosarevok

Supposedly he did that?

2013-01-01 00154, 2013

19:13 PM
ruaok

who did what?

2013-01-01 00103, 2013

19:14 PM
nikki

ian apparently got search indexes updating again

2013-01-01 00110, 2013

19:14 PM
reosarevok

that

2013-01-01 00129, 2013

19:14 PM
ruaok

oh, whoops.

2013-01-01 00133, 2013

19:14 PM
ruaok

he didn't mail me about that.

2013-01-01 00113, 2013

19:15 PM
ruaok

ah looks like we got one set out and I killed the next run thats been going for about an hour

2013-01-01 00145, 2013

19:16 PM
LordSputnik has left the channel