in #musicbrainz-devel

0:34 AM
ruaok

lol @ xkcd
0:36 AM
warp

lol!
1:20 AM
Prophet5 joined the channel
1:37 AM
reoafk joined the channel
2:05 AM
Ben\Sput has left the channel
2:06 AM
Ben\Sput joined the channel
2:06 AM
Ben\Sput

50* errors :(
2:09 AM
Ben\Sput has left the channel
2:51 AM
j-b_ joined the channel
2:51 AM
navap joined the channel
2:53 AM
DWSR2 joined the channel
2:55 AM
ocharles- joined the channel
5:33 AM
Prophet5 joined the channel
6:47 AM
andreypopp joined the channel
7:22 AM
andreypopp joined the channel
7:49 AM
Leftmost

ocharles-, warp, ianmcorvidae, getting 502s for just about everything.
8:02 AM
andreypopp joined the channel
8:24 AM
reosarevok joined the channel
8:25 AM
reosarevok

Anyone knows why the hell we're having 50x on every single page?
8:30 AM
Leftmost

No, and no one with access seems to be around to figure it out.
8:32 AM
reosarevok

Well
8:32 AM
Seems to be back for now
8:32 AM
Leftmost

It's been in and out for me for a while.
8:33 AM
reosarevok

Yeah, ok
8:33 AM
Gone again now
8:33 AM
*grumbles*
8:38 AM
andreypopp joined the channel
9:11 AM
petesake joined the channel
9:26 AM
DremoraLV joined the channel
10:21 AM
warp

hello!
10:39 AM
bandtrace joined the channel
10:45 AM
bandtrace joined the channel
10:52 AM
Leftmost joined the channel
11:23 AM
nikki_ joined the channel
11:29 AM
ruaok joined the channel
11:29 AM
ruaok

warp: PING
11:35 AM
warp

ack
11:39 AM
zas joined the channel
11:39 AM
nikki_ wakes up to a pile of ISEs about problems reading from the redis server
11:40 AM
nikki_: yep, we're aware of it. and it's even worse now apparantly.
11:41 AM
nikki_

I thought the redis stuff was supposed to stop it from ISEing like that :/
11:43 AM
warp

site is back.
11:43 AM
nikki_: this is a different ISE
11:44 AM
nikki_

well, there's still a bunch of these "Can't use an undefined value as a HASH reference" ones
11:44 AM
warp

nikki_: the theory was that either: 1. memcached would lose sessions (it's a cache, not a datastore). 2. if connection to memcached was lost a new session was created, so still losing the session.
11:45 AM
nikki_: which is why we switched to redis, because in redis is a datastore, and the connection handling is better
11:46 AM
but redis ran out of filehandles (memcached has stuff to deal with this, and redis as well, but our super old version of redis doesn't)
11:47 AM
nikki_: and ofcourse there are many other things broken in the release editor which can make it ISE.
11:47 AM
nikki_

so I've noticed
11:48 AM
the majority of the ISEs are the release editor crashing or people submitting cd stubs without a tracklist :(
11:48 AM
oh, or the random search ones
11:48 AM
Leftmost

Is it okay if I feel a great sense of satisfaction when I add a disc ID that kills a CD stub?
11:50 AM
warp

Leftmost: yes.
11:50 AM
nikki_

we can't exactly stop you :P
11:51 AM
Leftmost

Just because you can't stop me doesn't mean it's okay. :-P
11:56 AM
zas

Hmmm, i cannot log to acoustid.org with my usual credentials, is this related to the issue MB just had ?
11:57 AM
luks

zas: might be
11:58 AM
the MB auth requests are timing out
11:58 AM
zas

ohoho, 502 again on MB
12:07 PM
reosarevok joined the channel
12:18 PM
ocharles- joined the channel
12:19 PM
ocharles wakes up
12:19 PM
ocharles

warp: ping
12:20 PM
warp

ocharles: hello!
12:20 PM
ocharles

still file handle troubles?
12:20 PM
warp

ocharles: we've got redis at 100% cpu for no apparent reason
12:21 PM
I don't think it's file handles.
12:21 PM
Mineo joined the channel
12:21 PM
ocharles

what server?
12:22 PM
warp

roobarb
12:22 PM
ocharles

lets have a looksie
12:22 PM
warp

max open files is set correctly when I check /proc/19501/limits
12:23 PM
connect clients hovers between 200 and 300 when I can connect. (redis-cli info)
12:23 PM
connected
12:23 PM
ocharles

we have a 16 core machine so a load of 2 doesn't seem the end of the world, I guess?
12:24 PM
warp

yeah, that should be fine.
12:25 PM
ocharles

can I have sudo on that machine?
12:25 PM
warp

sure
12:25 PM
ocharles: done.
12:26 PM
ocharles

thanks
12:32 PM
warp

hrm. now there's two redis-server's running.
12:33 PM
ocharles

redis seems to be writing 30MB/s
12:33 PM
so I imagine that cpu usage is almost entirely io dominated
12:33 PM
warp

ok
12:33 PM
so it needs to flush/save less.
12:33 PM
ocharles

but according to /var/log/redis it isn't flushing that ofte
12:34 PM
warp

at most it should flush once every minute. but only if 10000 keys have changed since the last save.
12:34 PM
ocharles

hum, mabye it's not that, atop isn't showing much activity for the disk actually
12:37 PM
warp

:(
12:39 PM
andreypopp joined the channel
12:43 PM
ocharles: what are you currently doing?
12:43 PM
(there's still two redis-servers running, which cannot be good, I'd like to kill one
12:43 PM
)
12:44 PM
ocharles

i'm doing nothing
12:44 PM
go ahead
12:44 PM
ocharles does not see two servers
12:44 PM
warp

redis 19501 64.5 9.2 1533664 1527524 ? Ss 12:05 24:17 /usr/bin/redis-server /etc/redis/redis.conf
12:45 PM
redis 21088 25.8 9.2 1534108 1527768 ? R 12:38 1:17 /usr/bin/redis-server /etc/redis/redis.conf
12:45 PM
ocharles

root 18846 0.0 0.0 8292 720 pts/4 S+ 11:41 0:00 tail -f redis-server.log
12:45 PM
acid2 21241 0.0 0.0 7628 1020 pts/6 S+ 12:44 0:00 grep --color=auto redis
12:45 PM
oddly, I saw no servers
12:45 PM
which did you kill?
12:46 PM
warp

19501, which also seems to have killed the other one. so perhaps it's normal.
12:46 PM
ocharles

the logs say:
12:46 PM
30 Mar 12:04:55 * Opening TCP port: bind: Address already in use
12:46 PM
warp

but if one is an intentional/internal fork of the other, I would have expected their start time to be the same.
12:46 PM
ocharles

so I don't think it's normal
12:46 PM
also
12:46 PM
30 Mar 12:44:47 - The server is now ready to accept connections on port 6379
12:47 PM
30 Mar 11:20:14 * WARNING overcommit_memory is set to 0! Background save may fail under low condition memory. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
12:47 PM
but that's fail, not spin
12:47 PM
warp

the server has 4GB free memory, so that message doesn't seem relevant to our current trouble.
12:50 PM
ocharles

this connection is too weak to do anything useful :/
12:50 PM
warp

ocharles: anyway, so I don't understand why it's being this finnicky. it should have enough file handles, enough CPU, enough memory.
12:50 PM
ocharles

acid2 [at roobarb]:~$ ps aux | grep strc
12:50 PM
a^C^[[A^C^C^C^C^Z
12:50 PM
for example
12:50 PM
:)
12:51 PM
warp

ecuador cable internet ftw!
12:52 PM
ocharles: I'm inclined to install redis 2.x either on roobarb or a new hoser vm. 1.2 seems old. though upgrading and hoping that magically fixes a problem is in general not the best strategy.
12:53 PM
ocharles

i'm ok with that
12:54 PM
warp

ok
12:58 PM
ocharles

i really can't do anything other than advise i'm afraid
12:59 PM
i'm seeing java take 400% of the cpu though atm
13:00 PM
warp

ocharles: yep, roobarb and dora are the search servers.
13:01 PM
ocharles

i know
13:01 PM
but i mean the load doesn't seem to be coming from redis right now
13:01 PM
warp nods.
13:02 PM
warp

and redis is working fine when it's not spiking at 100%
13:02 PM
ocharles

it seems to have only broken today though
13:02 PM
and ruaok did an upgrade on the search servers yesterday, so i wonder if the events are correlated?
13:03 PM
http://stats.musicbrainz.org/webstats/nginx-rrd... shows it kicked in around 1am
13:04 PM
though that upgrade looks to have finished around 3 hours before
13:04 PM
warp

ocharles: and 3 hours is the interval at which we deploy search indexes? :)
13:04 PM
ocharles

i thought we did that in a loop now
13:06 PM
warp

then the loop takes 3 hours? or I'm just misremembering.
13:06 PM
ocharles

3 hours is what it says on the /search page
13:06 PM
Search Results
13:06 PM
Last updated: 2013-03-30 08:34 GMT
13:07 PM
that does seem to be quite a while ago
13:09 PM
warp

that is certainly more than 3 hours.
13:09 PM
ocharles

yea
13:09 PM
warp

ok, I've got a redis 2.6 on dora.
13:09 PM
ocharles

cool
13:11 PM
warp

shall I just switch things over, or should I make some attempt at preserving the sessions?
13:11 PM
ocharles

uff, roobarb at load 13 again
13:11 PM
and no, just switch them
13:11 PM
warp

alright.
13:11 PM
ocharles

it seems to be java that's really being problematic here
13:12 PM
warp

then switching to dora isn't going to help. it currently has java at 400% and load 4.something.