I've not looked at machines other than astro -- does this happen on pingu/asterix?
2015-04-25 11528, 2015
ianmcorvidae
yes, same thing on pingu at least
2015-04-25 11539, 2015
ruaok
da fuq?
2015-04-25 11506, 2015
ruaok
how can a local unix domain socket start acting up on three servers at the same time?
2015-04-25 11516, 2015
ianmcorvidae
enough requests that it's overloading the number of processes on all of them
2015-04-25 11544, 2015
ianmcorvidae
but that's 150, across the three of them, so I dunno
2015-04-25 11511, 2015
ruaok
so, the theory is that we have three servers stuck in some bizarre stuck state.
2015-04-25 11527, 2015
ruaok
we could take the site down and reboot all three of them
2015-04-25 11538, 2015
ruaok
let me make sure I understand this...
2015-04-25 11549, 2015
ianmcorvidae
I don't really know if I understand it, to be fair
2015-04-25 11502, 2015
ruaok
this is nginx trying to make a unix domain socket call to mb-server and it fails, but only for search queries.
2015-04-25 11502, 2015
ianmcorvidae also notes I need to leave nearly right now
2015-04-25 11510, 2015
ianmcorvidae
it's not only for searches
2015-04-25 11520, 2015
ianmcorvidae
we just have so many more search requests that the others get lost
2015-04-25 11530, 2015
ianmcorvidae
if you tail the error log and grep -v 'query' you'll see the others, same sort of thing
2015-04-25 11538, 2015
ruaok
ok,will do.
2015-04-25 11551, 2015
ruaok
bitmap: when we restarted the front end, what exactly did you do?
2015-04-25 11508, 2015
ruaok
we may just need to repeat the exercise.
2015-04-25 11522, 2015
ruaok
but this time stop both nginx and mb-server.
2015-04-25 11524, 2015
ianmcorvidae
follow the release process, it's in syswiki
2015-04-25 11528, 2015
ruaok
then remove the unix domain socket.
2015-04-25 11534, 2015
ianmcorvidae
and that *should* involve restarting nginx as well, but
2015-04-25 11556, 2015
ruaok
ah, we didn't fully do that. there was something amiss that was causing the git pull to hand.
2015-04-25 11557, 2015
ruaok
hang
2015-04-25 11523, 2015
ianmcorvidae
well, the pull doesn't matter really
2015-04-25 11533, 2015
bitmap
yeah, I just restarted the apps with svc -t /etc/service/musicbrainz-server
2015-04-25 11534, 2015
ruaok
clearly, but it halted the process.
2015-04-25 11539, 2015
ruaok
ah!
2015-04-25 11546, 2015
ruaok
no nginx restart.
2015-04-25 11555, 2015
ruaok
bitmap: lets do this process again.
2015-04-25 11500, 2015
bitmap
ok
2015-04-25 11509, 2015
ianmcorvidae
sudo -i, cd server-configs; ./provision.sh; sudo svc -t /etc/service/musicbrainz-server; tail-f /etc/service/musicbrainz-server/log/main/current until you see the "Binding to ..." etc. and then you can bring it back in
2015-04-25 11511, 2015
ruaok
this time use the usual process and watch that it does restart nginx
2015-04-25 11536, 2015
ianmcorvidae
provision may do a git pull for mbserver, of course, but hopefully that doesn't keep failing
2015-04-25 11550, 2015
bitmap
yeah, the provision.sh is what was hanging
2015-04-25 11551, 2015
ruaok
it better not now. :)
2015-04-25 11517, 2015
ianmcorvidae
if it's failing, a manual nginx restart is the main thing we wanted out of that anyway, but
2015-04-25 11525, 2015
bitmap nods
2015-04-25 11537, 2015
ianmcorvidae notes I did do a HUP to nginx on astro a bit ago, because I added a quick bit of logging for the search endpoint, which a provision would overwrite (but that's fine)
2015-04-25 11539, 2015
ruaok
astro out
2015-04-25 11528, 2015
bitmap
looks like ./provision.sh still hangs, I'll try svc -t
2015-04-25 11541, 2015
ianmcorvidae
nginx is system-installed, not daemontools. /etc/init.d
2015-04-25 11500, 2015
bitmap
ah, right
2015-04-25 11502, 2015
ianmcorvidae
(and the musicbrainz-server svc -t isn't done by provision anyway, so that needs to be done separately in any case)
2015-04-25 11546, 2015
bitmap
ok, astro done
2015-04-25 11522, 2015
ruaok
astro in pingu out
2015-04-25 11557, 2015
ruaok
getting resource busy errors on astro. though not as fast as before
2015-04-25 11547, 2015
ianmcorvidae
resource temporarily unavailable as before, or a new 'resource busy' error?
2015-04-25 11521, 2015
ruaok
the former.
2015-04-25 11522, 2015
ruaok
as before.
2015-04-25 11537, 2015
ruaok
1 every few seconds.
2015-04-25 11500, 2015
ianmcorvidae
1 every few seconds is way better than before, anyway
2015-04-25 11510, 2015
ruaok
I wonder if this started happening before today and my tinkering kicked it into high gear.
2015-04-25 11511, 2015
ruaok
for sure.
2015-04-25 11526, 2015
bitmap
pingu done
2015-04-25 11543, 2015
ruaok
pingu in, asterix out
2015-04-25 11544, 2015
ianmcorvidae
it has been happening some, this is what causes the instant 502s
2015-04-25 11516, 2015
ianmcorvidae
and I'm out. I'll have my phone if I'm really urgently needed but I'm out basically the whole evening :( and then free of this particular thing until the fall, so it's not all bad :P