in #musicbrainz-devel

15:15 PM
mb-chat-logger

http://tickets.musicbrainz.org/browse/MBS-357
15:15 PM
ijabz

So what is the question you wanted to ask me ?
15:15 PM
ianmcorvidae

ijabz: so what I'm wondering is if we can change the way the search server behaves when it gets that
15:16 PM
specifically, it should finish up any pending requests at that time, and queue up any it gets from that time until it's been re-initialized with the new indexes
15:16 PM
because at present when it gets that it apparently cuts off whatever it's doing right there -- so invalid JSON is getting sent upstream to the website search, which then causes an ISE
15:17 PM
ruaok

eek. that is a royal pain to deal with.
15:17 PM
ianmcorvidae

(I don't know if this also affects the webservice -- quite possibly it does)
15:17 PM
ruaok

I wish that we could remote control nginx.
15:17 PM
nginx, please take dora out.
15:17 PM
restart dora,
15:17 PM
ijabz

hmm, but searchservlet is stateless so when it recives that request it is unaware of any previous request sthat have not been completed
15:17 PM
ianmcorvidae

I have a quasi-hack that just does a retry from MBS, but ollie (correctly) wants us to do it right, so :)
15:17 PM
ruaok

nginx, dora back in.
15:17 PM
that would be the best way to do this.
15:18 PM
ianmcorvidae

ruaok: I agree, but you're right that it's hard
15:18 PM
ruaok: the way we could do that is to split out the upstream definition bits and write a bit of a shuffle-symlinks-then-HUP script
15:18 PM
ijabz

yep, rotating the two servers one by one make smore sense
15:18 PM
warp

warp has changed the topic to: Ponyta week / Agenda: reviews, ws barcode submissions (reotab)
15:19 PM
ruaok

or...
15:19 PM
ocharles

warp: we released
15:19 PM
ruaok

we could send the search server a signal to stop accepting new connections.
15:19 PM
ocharles

inline with what was in jira
15:19 PM
ruaok

which will cause nginx to send any requests to the other server.
15:19 PM
ianmcorvidae

ruaok: tell it to stop, wait 10 seconds, then re-init, you mean?
15:19 PM
ruaok

then after a second to let connections die down, Kill -9 the server for a restart.
15:20 PM
ianmcorvidae

(or whatever seconds)
15:20 PM
ruaok

yes.
15:20 PM
except due to memory issues, we just kill -9 it
15:20 PM
otherwise the memory does not get reclaimed.
15:20 PM
ocharles

ianmcorvidae: it's expecting to be merged to mbs-357
15:20 PM
mb-chat-logger

http://tickets.musicbrainz.org/browse/MBS-357
15:20 PM
ianmcorvidae

that still hits the statelessness problem potentially
15:20 PM
ocharles

so i'll do that, and then delete the branch :)
15:20 PM
ianmcorvidae

ocharles: k, cool :)
15:20 PM
warp

ocharles: great, can you approve https://bitbucket.org/metabrainz/musicbrainz-se... so I can merge that as well? :)
15:20 PM
ruaok

ianmcorvidae: how does it still hit that problem?
15:21 PM
ocharles

surely search server has something that is handling connections coming?
15:21 PM
mb-chat-logger

http://tickets.musicbrainz.org/browse/MBS-6395
15:21 PM
ianmcorvidae

ruaok: well, either way you're talking about some sort of global setting toggle
15:21 PM
ocharles

so can't the init=mmap handler tell that take out a connection 'lock', wait for it to be granted, do its work, and then release the lock?
15:21 PM
acquiring a lock would require all open requests to finish
15:21 PM
ianmcorvidae

ruaok: in my 'suggestion' I'm saying you do a toggle to a "queue" state, in yours to a "refuse" state
15:21 PM
ruaok

queue state is more trouble some to me.
15:22 PM
ijabz

The connections are handled by the servlet container itself jetty/tomcat not the code
15:22 PM
ruaok

now you're behind on requests that you could have another server deal with.
15:22 PM
ocharles

there are in fact already third party binaries that do exactly this
15:22 PM
ianmcorvidae

ruaok: that wasn't the problem ijabz mentioned though :P
15:22 PM
ocharles

ruaok: sure, but upstream should have timeouts
15:22 PM
ruaok

ianmcorvidae: thats why I mentioned it. :)
15:23 PM
ianmcorvidae

okay, well, you did it by saying that ijabz's problem stopped existing :P
15:23 PM
ocharles

it's not search servers responsibility to be terminating connections if it can ultimately serve them
15:23 PM
djce joined the channel
15:23 PM
ianmcorvidae

I think that we're getting into pointless weeds here and I should just bite the bullet and figure out making the loadbalancer do this how we want
15:23 PM
ruaok

this makes very litttle sense to me.
15:23 PM
ijabz

but i the code itself I could have it responding to a new command, so that all subsequent caommands simple cause the code to return a HttpError if thats what you want
15:24 PM
ocharles

how long does init=mmap take?
15:24 PM
ballpark figure
15:24 PM
ianmcorvidae

hm, I wonder what an actual error would do (vs. failing connections)
15:24 PM
ruaok

a while.
15:24 PM
ianmcorvidae

(at the frontend, I mean)
15:24 PM
ruaok

ocharles: the load will spike to 15 for a few minutes while it loads all new data.
15:24 PM
assuming that this is a new index.
15:24 PM
ocharles

if we can't queue connections, we should take it out of rotation and let the other server handle it
15:24 PM
ruaok

with a new index the caches are cold.
15:24 PM
ocharles

and if that's the case, then we need a way to coordinate nginx, as ruaok outlined earlier
15:24 PM
ruaok

queueing is very troublesome, when considered with why we do restarts.
15:24 PM
ianmcorvidae

really, it's that long? I guess it just refuses connections after a short period of time
15:25 PM
we don't do restarts?
15:25 PM
ruaok

nope.
15:25 PM
kill -9 is all we ever do.
15:25 PM
ianmcorvidae

no, I mean, you're asserting we do that
15:25 PM
we don't, as far as I can see
15:25 PM
we only do wget to ?init=mmap
15:25 PM
(and then to ?rate=true, but)
15:26 PM
I don't see a kill or a restart anywhere -- maybe it's hidden somewhere I hadn't found, and *that's* the real problem
15:27 PM
a kill -9 would be pretty consistent with the whole "the JSON data stops randomly in the middle" phenomenon
15:28 PM
... bah, that isn't actually a git repository, and the search servers have a different receive script than cartman's version
15:28 PM
ruaok

https://gist.github.com/mayhem/5749669
15:28 PM
yes.
15:28 PM
its quite messy.
15:28 PM
another thing to maintain, too few people to maintain them.
15:28 PM
ocharles

ruaok: btw, '-t' is '-d' and '-u'
15:29 PM
that sleep doesn't do anything either
15:29 PM
ruaok

-k would work fine
15:29 PM
so, who wants to own the search servers?
15:29 PM
ruaok feels burt out
15:29 PM
ocharles

everyone should, it should be in fabfile.py, or some other bit of automated deployment
15:30 PM
ianmcorvidae

MBH-150
15:30 PM
mb-chat-logger

http://tickets.musicbrainz.org/browse/MBH-150
15:31 PM
ianmcorvidae

still doesn't really solve the actual problem, which I think the answer to is "bite the bullet and figure out how to automate the loadbalancers"
15:31 PM
ruaok unassigns himself
15:31 PM
so I'll do that
15:31 PM
and until such a time as we have a solution to MBH-150 I'll probably just add that to the makefiles
15:31 PM
mb-chat-logger

http://tickets.musicbrainz.org/browse/MBH-150
15:31 PM
ocharles

ianmcorvidae: I have a good idea on how to do that
15:32 PM
so ping me before you work on that
15:32 PM
ianmcorvidae

ocharles: automating the loadbalancer?
15:32 PM
ocharles

yes
15:32 PM
ianmcorvidae

ocharles: I was figuring split out the upstreams section, have a few symlinks to shuffle between
15:32 PM
ocharles

yep, that's pretty much it
15:32 PM
MBJenkins

Project musicbrainz-server_beta build #475: STILL FAILING in 5 min 44 sec: http://ci.musicbrainz.org/job/musicbrainz-serve...
15:32 PM
* warp: MBS-6395, add shell script to loop over update-medium-index.pl until all work is done.
15:32 PM
* warp: MBS-6395, discs where not all tracks have a length should not have an entry in medium_index.
15:32 PM
* warp: MBS-5958, also add an updateTrackNumbers call to resetTrackNumbers.
15:32 PM
* warp: MBS-5903, include tests.
15:32 PM
* warp: Revert "Revert "MBS-6416, keep track of track row ids when editing a medium.""
15:32 PM
ocharles

sounds like we're on the same page
15:32 PM
MBJenkins

* warp: MBS-6374, convert schema 16 style country/date pairs seeded to the release editor to schema 18 events.
15:32 PM
* warp: MBS-6261, do not render "date unknown" in date/country pairs with unknown dates.
15:32 PM
* warp: MBS-6261, change remaining "Release Events" columns to seperate date/country columns.
15:33 PM
ianmcorvidae

maybe a script to generate all the appropriate source files if I want to get real fancy :)
15:35 PM
ocharles

the more that can be automated, the better
15:36 PM
warp: don't forget to set your tickets to in beta and set the fix version if necessary
15:37 PM
nikkiphone2 joined the channel
15:37 PM
djce joined the channel
15:39 PM
ianmcorvidae idly wonders if I can also make it do something for MBS like disable one server and return its name in some format fabric can use it, then when called again move to the next, etc.
15:39 PM
ianmcorvidae

so we can have one call to fab production
15:39 PM
MBJenkins

Project musicbrainz-server_beta build #476: STILL FAILING in 6 min 23 sec: http://ci.musicbrainz.org/job/musicbrainz-serve...
15:39 PM
* warp: MBS-6101, guard c.session.tport appropriately (see previous commit).
15:39 PM
* warp: MBS-6101, make Redis connection lazy.
15:39 PM
* warp: MBS-6101, verify that Redis->new() selects the specified database.
15:40 PM
ocharles

ianmcorvidae: yea, moving to one command deploy is a definite goal
15:40 PM
ianmcorvidae

I should still add all our servers to /etc/hosts or something too :/ I guess that's the bit I still need to do production deployments
15:41 PM
(or maybe to .ssh/config. not quite sure how that works)
15:41 PM
ocharles

production deployments needs .ssh/config
15:41 PM
at least to do fab production -Hastro
15:41 PM
ianmcorvidae

ah, okay
15:41 PM
cool
15:49 PM
it's too bad SSH seems to lack an include mechanism for host files
15:49 PM
or we could distribute one to people who have a VPN, to include in their own
15:50 PM
ocharles

mm
15:50 PM
ianmcorvidae

though I suppose fabric can probably take IP addresses as the host string (hopefully?) and then with my ostensible plan for a one-command deployment it can just return that instead of the name
15:51 PM
reosarevok joined the channel
15:56 PM
ocharles

yes, it can take anything ssh can take
15:56 PM
within reason
16:05 PM
warp

DBD::Pg::st execute failed: ERROR: null value in column "ha1" violates not-null constraint at lib/Sql.pm line 107, <FILE> line 1.
16:05 PM
that's related to the bcrypt changes I assume?
16:07 PM
ianmcorvidae

yeah
16:10 PM
warp

hrm. Unauthorized
16:12 PM
ijabz joined the channel
16:15 PM
warp hopes that fixes beta.
16:18 PM
outsidecontext joined the channel
16:25 PM
MBJenkins

warp: Insert editor into editor table with updated bcrypt columns (in WS::2::Collection test).
16:26 PM
warp

still failing ..
16:32 PM
outsidecontext_ joined the channel
16:44 PM
MBJenkins

Project musicbrainz-server_beta build #478: STILL FAILING in 9 min 53 sec: http://ci.musicbrainz.org/job/musicbrainz-serve...
16:44 PM
warp: MBS-6101, fix bad merge.
16:45 PM
djce joined the channel
16:52 PM
Sophist_uk joined the channel
17:04 PM
nikki_ joined the channel