I suspect the git issue might be firewall related.
2015-04-25 11558, 2015
ruaok
doesn't seem like this is helping.
2015-04-25 11504, 2015
ruaok
but lets finish.
2015-04-25 11522, 2015
ruaok
after that I can try to bring the old search load balancer back
2015-04-25 11532, 2015
bitmap
asterix is done
2015-04-25 11509, 2015
ruaok
thx.
2015-04-25 11515, 2015
bitmap
yeah, still seeing 502s on the homepage
2015-04-25 11517, 2015
ruaok
the rate has dropped, but is still very hig.
2015-04-25 11512, 2015
Leo_Verto
hmm, reboot all the things? :P
2015-04-25 11551, 2015
ruaok
trying.
2015-04-25 11501, 2015
ruaok
kicking search servers now.
2015-04-25 11529, 2015
ruaok
20% 502s now. :)
2015-04-25 11534, 2015
ruaok
down from 50%
2015-04-25 11513, 2015
ruaok
bitmap: can you look into the access log on astro and examine the 502 errors?
2015-04-25 11521, 2015
ruaok
tell me what you think.
2015-04-25 11545, 2015
ruaok
ok, I'm calling this test failed. I'll bring the traffic back to the carl. :(
2015-04-25 11536, 2015
moufl joined the channel
2015-04-25 11552, 2015
mb-chat-logger joined the channel
2015-04-25 11511, 2015
MBJenkins joined the channel
2015-04-25 11538, 2015
Muz_ joined the channel
2015-04-25 11521, 2015
bitmap
the only hints I can find are that there's too many backlogged sockets on the frontends hitting net.core.somaxconn, but I don't get why switching to ernie would saturate that
2015-04-25 11555, 2015
ruaok
maybe that is something that needs to be tuned.
2015-04-25 11502, 2015
ruaok
how did you spy this?
2015-04-25 11505, 2015
bitmap
I googled the 'Resource temporarily unavailable' thing and saw that somaxconn was still the default on astro
bitmap: while I go look at pgbouncer, can you please see what queries astro sends to the search load balancer?
2015-04-25 11506, 2015
chirlu-mobile joined the channel
2015-04-25 11521, 2015
chirlu-mobile
Could the WS search queries thing be related to the nginx accel-redirect?
2015-04-25 11544, 2015
ruaok
I was wondering about that.
2015-04-25 11554, 2015
ruaok
ianmcorvidae: you know more about that than I do.
2015-04-25 11519, 2015
chirlu-mobile
re: psql, the number of connections has always been close to the limit since yesterday, so increasing max_connections in Postgres may be the solution.
2015-04-25 11541, 2015
ruaok
chirlu-mobile: what is interesting is that the number of waiting connections is now at ~120.
2015-04-25 11544, 2015
ruaok
constantly.
2015-04-25 11515, 2015
ruaok
I think I am going to move the ips that I took from carl and send them back to carl.
2015-04-25 11530, 2015
ruaok
and hopefully that will get things to go back to normal
2015-04-25 11557, 2015
chirlu-mobile
Long queue could also be caused simply by higher load.
2015-04-25 11514, 2015
ruaok
the load is very low right now
2015-04-25 11519, 2015
chirlu-mobile
Then it's not that. :)
2015-04-25 11553, 2015
michiwend joined the channel
2015-04-25 11541, 2015
ruaok
can anyone ping 72.29.167.148 ?
2015-04-25 11527, 2015
bitmap
not I
2015-04-25 11524, 2015
ruaok
carl is refusing to take the IP back.
2015-04-25 11511, 2015
ianmcorvidae
bitmap/ruaok: if you want to do psql from somewhere like astro, use port 6899 so you go through pgbouncer
2015-04-25 11523, 2015
ianmcorvidae
(or use ./admin/psql READWRITE, which of course does that for you)
2015-04-25 11536, 2015
ruaok
yes, and if you get that error, using -U postgres will get you in too.
2015-04-25 11558, 2015
ianmcorvidae
yes. though probably not from astro :)
2015-04-25 11533, 2015
ianmcorvidae
and the reason I was asking about the search-private thing was the accel-redirect thing, but it appears to be correctly configured
2015-04-25 11548, 2015
ianmcorvidae
unless there's differences in a toplevel nginx config somehow, but I don't know how that'd be since you just copied it from carl
2015-04-25 11555, 2015
ruaok
I'm starting to think that the iptables has some goof in it.
2015-04-25 11519, 2015
kepstin-laptop joined the channel
2015-04-25 11523, 2015
reosarevok joined the channel
2015-04-25 11524, 2015
kepstin-laptop
ruaok, pong
2015-04-25 11534, 2015
ruaok
hey, been a rough day here. :(
2015-04-25 11558, 2015
ruaok
I'm in the process of migrating everything back to the old gateway. I'm burnt.
2015-04-25 11506, 2015
ruaok
let me finish that first.
2015-04-25 11537, 2015
ianmcorvidae
only difference in iptables is ernie has stuff in INPUT for .150 and for gtest.musicbrainz.org
2015-04-25 11506, 2015
ruaok
150 was my dns test ip.
2015-04-25 11540, 2015
ruaok
ianmcorvidae: do you see all the differences betwen em2:0 and em2?
2015-04-25 11504, 2015
ruaok
any time I wanted to use a rule with em2:0 it would not work. using em2 would.
2015-04-25 11532, 2015
ianmcorvidae
iptables -L isn't showing me anything with that.
2015-04-25 11502, 2015
ruaok
iptables-save does.
2015-04-25 11509, 2015
kepstin-laptop
iptables -L doesn't show all the tables, you have to use -t to ask for a specific one
2015-04-25 11519, 2015
ianmcorvidae
yeah, looking at nat now
2015-04-25 11546, 2015
ianmcorvidae
the raw table has something that seems amiss as far as connection-tracking, but that's the only thing I can see
2015-04-25 11519, 2015
ianmcorvidae
(carl has target NOTRACK, as should be expected, for the couple of UDP things we turn that off for, ernie has target CT, and NOTRACK is in the destination stuff for some reason)
2015-04-25 11529, 2015
ruaok
kepstin-laptop what are the arping invocation you were suggesting?
2015-04-25 11544, 2015
ruaok
I've got one external ip that refused to go back to carl.
2015-04-25 11503, 2015
ruaok
that is the last thing to undo everything I've done today.
2015-04-25 11526, 2015
kepstin-laptop
arping -U -I ethdevice ip.add.re.ss
2015-04-25 11557, 2015
ruaok
from which machine?
2015-04-25 11504, 2015
ruaok
the one that lost or received the ip?
2015-04-25 11510, 2015
kepstin-laptop
the one that you want to have the ip
2015-04-25 11526, 2015
kepstin-laptop
that will send a fake arp reply from that machine to update the arp table in the switch
2015-04-25 11546, 2015
ruaok
no replies.
2015-04-25 11551, 2015
ruaok
it just sits there.
2015-04-25 11501, 2015
kepstin-laptop
that's not supposed to be any replies, it's sending replies
2015-04-25 11532, 2015
ruaok
worked this time. never did before. :)
2015-04-25 11521, 2015
ruaok
ok, everything is undone. the site is as I found it this morning.
2015-04-25 11503, 2015
ianmcorvidae
search queries still not working though
2015-04-25 11544, 2015
ruaok
exactly.
2015-04-25 11548, 2015
ruaok
da fuq?
2015-04-25 11550, 2015
ianmcorvidae
huh
2015-04-25 11552, 2015
ianmcorvidae
though one just did
2015-04-25 11504, 2015
ruaok
30% - 40% are failing.
2015-04-25 11505, 2015
ianmcorvidae
yup, now they're back reliably
2015-04-25 11507, 2015
ianmcorvidae
wacky
2015-04-25 11512, 2015
ianmcorvidae
maybe one server's still not figured it out
2015-04-25 11533, 2015
ruaok
lots and lots still persist.
2015-04-25 11537, 2015
ruaok
ok, food just arrive
2015-04-25 11541, 2015
ruaok
back after noms.
2015-04-25 11548, 2015
ianmcorvidae confirms, search stuff to the backend thing is working, something's failing higher-up than that
2015-04-25 11528, 2015
ianmcorvidae
and it's not all queries, that's just because there's so many more of them than other stuff
2015-04-25 11546, 2015
ianmcorvidae
right, and all the stats are still broken >_<
2015-04-25 11511, 2015
ruaok
yeah, once the gateway stuff is done, I was going to pick rvedotrc's brain about that