ijabz: so what I'm wondering is if we can change the way the search server behaves when it gets that
2013-06-10 16116, 2013
ianmcorvidae
specifically, it should finish up any pending requests at that time, and queue up any it gets from that time until it's been re-initialized with the new indexes
2013-06-10 16145, 2013
ianmcorvidae
because at present when it gets that it apparently cuts off whatever it's doing right there -- so invalid JSON is getting sent upstream to the website search, which then causes an ISE
2013-06-10 16126, 2013
ruaok
eek. that is a royal pain to deal with.
2013-06-10 16129, 2013
ianmcorvidae
(I don't know if this also affects the webservice -- quite possibly it does)
2013-06-10 16134, 2013
ruaok
I wish that we could remote control nginx.
2013-06-10 16147, 2013
ruaok
nginx, please take dora out.
2013-06-10 16151, 2013
ruaok
restart dora,
2013-06-10 16153, 2013
ijabz
hmm, but searchservlet is stateless so when it recives that request it is unaware of any previous request sthat have not been completed
2013-06-10 16154, 2013
ianmcorvidae
I have a quasi-hack that just does a retry from MBS, but ollie (correctly) wants us to do it right, so :)
2013-06-10 16155, 2013
ruaok
nginx, dora back in.
2013-06-10 16159, 2013
ruaok
that would be the best way to do this.
2013-06-10 16110, 2013
ianmcorvidae
ruaok: I agree, but you're right that it's hard
2013-06-10 16138, 2013
ianmcorvidae
ruaok: the way we could do that is to split out the upstream definition bits and write a bit of a shuffle-symlinks-then-HUP script
2013-06-10 16140, 2013
ijabz
yep, rotating the two servers one by one make smore sense
2013-06-10 16153, 2013
warp
warp has changed the topic to: Ponyta week / Agenda: reviews, ws barcode submissions (reotab)
2013-06-10 16107, 2013
ruaok
or...
2013-06-10 16121, 2013
ocharles
warp: we released
2013-06-10 16123, 2013
ruaok
we could send the search server a signal to stop accepting new connections.
2013-06-10 16128, 2013
ocharles
inline with what was in jira
2013-06-10 16141, 2013
ruaok
which will cause nginx to send any requests to the other server.
2013-06-10 16155, 2013
ianmcorvidae
ruaok: tell it to stop, wait 10 seconds, then re-init, you mean?
2013-06-10 16159, 2013
ruaok
then after a second to let connections die down, Kill -9 the server for a restart.
2013-06-10 16101, 2013
ianmcorvidae
(or whatever seconds)
2013-06-10 16107, 2013
ruaok
yes.
2013-06-10 16117, 2013
ruaok
except due to memory issues, we just kill -9 it
2013-06-10 16125, 2013
ruaok
otherwise the memory does not get reclaimed.
2013-06-10 16127, 2013
ocharles
ianmcorvidae: it's expecting to be merged to mbs-357
ruaok: well, either way you're talking about some sort of global setting toggle
2013-06-10 16120, 2013
ocharles
so can't the init=mmap handler tell that take out a connection 'lock', wait for it to be granted, do its work, and then release the lock?
2013-06-10 16133, 2013
ocharles
acquiring a lock would require all open requests to finish
2013-06-10 16133, 2013
ianmcorvidae
ruaok: in my 'suggestion' I'm saying you do a toggle to a "queue" state, in yours to a "refuse" state
2013-06-10 16156, 2013
ruaok
queue state is more trouble some to me.
2013-06-10 16101, 2013
ijabz
The connections are handled by the servlet container itself jetty/tomcat not the code
2013-06-10 16106, 2013
ruaok
now you're behind on requests that you could have another server deal with.
2013-06-10 16111, 2013
ocharles
there are in fact already third party binaries that do exactly this
2013-06-10 16120, 2013
ianmcorvidae
ruaok: that wasn't the problem ijabz mentioned though :P
2013-06-10 16135, 2013
ocharles
ruaok: sure, but upstream should have timeouts
2013-06-10 16138, 2013
ruaok
ianmcorvidae: thats why I mentioned it. :)
2013-06-10 16100, 2013
ianmcorvidae
okay, well, you did it by saying that ijabz's problem stopped existing :P
2013-06-10 16101, 2013
ocharles
it's not search servers responsibility to be terminating connections if it can ultimately serve them
2013-06-10 16109, 2013
djce joined the channel
2013-06-10 16135, 2013
ianmcorvidae
I think that we're getting into pointless weeds here and I should just bite the bullet and figure out making the loadbalancer do this how we want
2013-06-10 16136, 2013
ruaok
this makes very litttle sense to me.
2013-06-10 16138, 2013
ijabz
but i the code itself I could have it responding to a new command, so that all subsequent caommands simple cause the code to return a HttpError if thats what you want
2013-06-10 16103, 2013
ocharles
how long does init=mmap take?
2013-06-10 16107, 2013
ocharles
ballpark figure
2013-06-10 16109, 2013
ianmcorvidae
hm, I wonder what an actual error would do (vs. failing connections)
2013-06-10 16111, 2013
ruaok
a while.
2013-06-10 16123, 2013
ianmcorvidae
(at the frontend, I mean)
2013-06-10 16124, 2013
ruaok
ocharles: the load will spike to 15 for a few minutes while it loads all new data.
2013-06-10 16129, 2013
ruaok
assuming that this is a new index.
2013-06-10 16135, 2013
ocharles
if we can't queue connections, we should take it out of rotation and let the other server handle it
2013-06-10 16136, 2013
ruaok
with a new index the caches are cold.
2013-06-10 16152, 2013
ocharles
and if that's the case, then we need a way to coordinate nginx, as ruaok outlined earlier
2013-06-10 16155, 2013
ruaok
queueing is very troublesome, when considered with why we do restarts.
2013-06-10 16156, 2013
ianmcorvidae
really, it's that long? I guess it just refuses connections after a short period of time
2013-06-10 16104, 2013
ianmcorvidae
we don't do restarts?
2013-06-10 16108, 2013
ruaok
nope.
2013-06-10 16113, 2013
ruaok
kill -9 is all we ever do.
2013-06-10 16117, 2013
ianmcorvidae
no, I mean, you're asserting we do that
2013-06-10 16121, 2013
ianmcorvidae
we don't, as far as I can see
2013-06-10 16128, 2013
ianmcorvidae
we only do wget to ?init=mmap
2013-06-10 16141, 2013
ianmcorvidae
(and then to ?rate=true, but)
2013-06-10 16155, 2013
ianmcorvidae
I don't see a kill or a restart anywhere -- maybe it's hidden somewhere I hadn't found, and *that's* the real problem
2013-06-10 16123, 2013
ianmcorvidae
a kill -9 would be pretty consistent with the whole "the JSON data stops randomly in the middle" phenomenon
2013-06-10 16114, 2013
ianmcorvidae
... bah, that isn't actually a git repository, and the search servers have a different receive script than cartman's version
maybe a script to generate all the appropriate source files if I want to get real fancy :)
2013-06-10 16110, 2013
ocharles
the more that can be automated, the better
2013-06-10 16154, 2013
ocharles
warp: don't forget to set your tickets to in beta and set the fix version if necessary
2013-06-10 16126, 2013
nikkiphone2 joined the channel
2013-06-10 16150, 2013
djce joined the channel
2013-06-10 16100, 2013
ianmcorvidae idly wonders if I can also make it do something for MBS like disable one server and return its name in some format fabric can use it, then when called again move to the next, etc.
* warp: MBS-6101, guard c.session.tport appropriately (see previous commit).
2013-06-10 16116, 2013
MBJenkins
* warp: MBS-6101, make Redis connection lazy.
2013-06-10 16117, 2013
MBJenkins
* warp: MBS-6101, verify that Redis->new() selects the specified database.
2013-06-10 16118, 2013
ocharles
ianmcorvidae: yea, moving to one command deploy is a definite goal
2013-06-10 16156, 2013
ianmcorvidae
I should still add all our servers to /etc/hosts or something too :/ I guess that's the bit I still need to do production deployments
2013-06-10 16107, 2013
ianmcorvidae
(or maybe to .ssh/config. not quite sure how that works)
2013-06-10 16125, 2013
ocharles
production deployments needs .ssh/config
2013-06-10 16132, 2013
ocharles
at least to do fab production -Hastro
2013-06-10 16142, 2013
ianmcorvidae
ah, okay
2013-06-10 16143, 2013
ianmcorvidae
cool
2013-06-10 16101, 2013
ianmcorvidae
it's too bad SSH seems to lack an include mechanism for host files
2013-06-10 16120, 2013
ianmcorvidae
or we could distribute one to people who have a VPN, to include in their own
2013-06-10 16114, 2013
ocharles
mm
2013-06-10 16152, 2013
ianmcorvidae
though I suppose fabric can probably take IP addresses as the host string (hopefully?) and then with my ostensible plan for a one-command deployment it can just return that instead of the name
2013-06-10 16150, 2013
reosarevok joined the channel
2013-06-10 16147, 2013
ocharles
yes, it can take anything ssh can take
2013-06-10 16154, 2013
ocharles
within reason
2013-06-10 16142, 2013
warp
DBD::Pg::st execute failed: ERROR: null value in column "ha1" violates not-null constraint at lib/Sql.pm line 107, <FILE> line 1.
2013-06-10 16150, 2013
warp
that's related to the bcrypt changes I assume?
2013-06-10 16157, 2013
ianmcorvidae
yeah
2013-06-10 16142, 2013
warp
hrm. Unauthorized
2013-06-10 16138, 2013
ijabz joined the channel
2013-06-10 16114, 2013
warp hopes that fixes beta.
2013-06-10 16114, 2013
outsidecontext joined the channel
2013-06-10 16133, 2013
MBJenkins
warp: Insert editor into editor table with updated bcrypt columns (in WS::2::Collection test).