ijabz: so what I'm wondering is if we can change the way the search server behaves when it gets that
specifically, it should finish up any pending requests at that time, and queue up any it gets from that time until it's been re-initialized with the new indexes
because at present when it gets that it apparently cuts off whatever it's doing right there -- so invalid JSON is getting sent upstream to the website search, which then causes an ISE
ruaok
eek. that is a royal pain to deal with.
ianmcorvidae
(I don't know if this also affects the webservice -- quite possibly it does)
ruaok
I wish that we could remote control nginx.
nginx, please take dora out.
restart dora,
ijabz
hmm, but searchservlet is stateless so when it recives that request it is unaware of any previous request sthat have not been completed
ianmcorvidae
I have a quasi-hack that just does a retry from MBS, but ollie (correctly) wants us to do it right, so :)
ruaok
nginx, dora back in.
that would be the best way to do this.
ianmcorvidae
ruaok: I agree, but you're right that it's hard
ruaok: the way we could do that is to split out the upstream definition bits and write a bit of a shuffle-symlinks-then-HUP script
ijabz
yep, rotating the two servers one by one make smore sense
warp
warp has changed the topic to: Ponyta week / Agenda: reviews, ws barcode submissions (reotab)
ruaok
or...
ocharles
warp: we released
ruaok
we could send the search server a signal to stop accepting new connections.
ocharles
inline with what was in jira
ruaok
which will cause nginx to send any requests to the other server.
ianmcorvidae
ruaok: tell it to stop, wait 10 seconds, then re-init, you mean?
ruaok
then after a second to let connections die down, Kill -9 the server for a restart.
ianmcorvidae
(or whatever seconds)
ruaok
yes.
except due to memory issues, we just kill -9 it
otherwise the memory does not get reclaimed.
ocharles
ianmcorvidae: it's expecting to be merged to mbs-357
ruaok: well, either way you're talking about some sort of global setting toggle
ocharles
so can't the init=mmap handler tell that take out a connection 'lock', wait for it to be granted, do its work, and then release the lock?
acquiring a lock would require all open requests to finish
ianmcorvidae
ruaok: in my 'suggestion' I'm saying you do a toggle to a "queue" state, in yours to a "refuse" state
ruaok
queue state is more trouble some to me.
ijabz
The connections are handled by the servlet container itself jetty/tomcat not the code
ruaok
now you're behind on requests that you could have another server deal with.
ocharles
there are in fact already third party binaries that do exactly this
ianmcorvidae
ruaok: that wasn't the problem ijabz mentioned though :P
ocharles
ruaok: sure, but upstream should have timeouts
ruaok
ianmcorvidae: thats why I mentioned it. :)
ianmcorvidae
okay, well, you did it by saying that ijabz's problem stopped existing :P
ocharles
it's not search servers responsibility to be terminating connections if it can ultimately serve them
djce joined the channel
ianmcorvidae
I think that we're getting into pointless weeds here and I should just bite the bullet and figure out making the loadbalancer do this how we want
ruaok
this makes very litttle sense to me.
ijabz
but i the code itself I could have it responding to a new command, so that all subsequent caommands simple cause the code to return a HttpError if thats what you want
ocharles
how long does init=mmap take?
ballpark figure
ianmcorvidae
hm, I wonder what an actual error would do (vs. failing connections)
ruaok
a while.
ianmcorvidae
(at the frontend, I mean)
ruaok
ocharles: the load will spike to 15 for a few minutes while it loads all new data.
assuming that this is a new index.
ocharles
if we can't queue connections, we should take it out of rotation and let the other server handle it
ruaok
with a new index the caches are cold.
ocharles
and if that's the case, then we need a way to coordinate nginx, as ruaok outlined earlier
ruaok
queueing is very troublesome, when considered with why we do restarts.
ianmcorvidae
really, it's that long? I guess it just refuses connections after a short period of time
we don't do restarts?
ruaok
nope.
kill -9 is all we ever do.
ianmcorvidae
no, I mean, you're asserting we do that
we don't, as far as I can see
we only do wget to ?init=mmap
(and then to ?rate=true, but)
I don't see a kill or a restart anywhere -- maybe it's hidden somewhere I hadn't found, and *that's* the real problem
a kill -9 would be pretty consistent with the whole "the JSON data stops randomly in the middle" phenomenon
... bah, that isn't actually a git repository, and the search servers have a different receive script than cartman's version
maybe a script to generate all the appropriate source files if I want to get real fancy :)
ocharles
the more that can be automated, the better
warp: don't forget to set your tickets to in beta and set the fix version if necessary
nikkiphone2 joined the channel
djce joined the channel
ianmcorvidae idly wonders if I can also make it do something for MBS like disable one server and return its name in some format fabric can use it, then when called again move to the next, etc.
* warp: MBS-6101, guard c.session.tport appropriately (see previous commit).
* warp: MBS-6101, make Redis connection lazy.
* warp: MBS-6101, verify that Redis->new() selects the specified database.
ocharles
ianmcorvidae: yea, moving to one command deploy is a definite goal
ianmcorvidae
I should still add all our servers to /etc/hosts or something too :/ I guess that's the bit I still need to do production deployments
(or maybe to .ssh/config. not quite sure how that works)
ocharles
production deployments needs .ssh/config
at least to do fab production -Hastro
ianmcorvidae
ah, okay
cool
it's too bad SSH seems to lack an include mechanism for host files
or we could distribute one to people who have a VPN, to include in their own
ocharles
mm
ianmcorvidae
though I suppose fabric can probably take IP addresses as the host string (hopefully?) and then with my ostensible plan for a one-command deployment it can just return that instead of the name
reosarevok joined the channel
ocharles
yes, it can take anything ssh can take
within reason
warp
DBD::Pg::st execute failed: ERROR: null value in column "ha1" violates not-null constraint at lib/Sql.pm line 107, <FILE> line 1.
that's related to the bcrypt changes I assume?
ianmcorvidae
yeah
warp
hrm. Unauthorized
ijabz joined the channel
warp hopes that fixes beta.
outsidecontext joined the channel
MBJenkins
warp: Insert editor into editor table with updated bcrypt columns (in WS::2::Collection test).