#musicbrainz-devel

/

15:15 PM
mb-chat-logger

http://tickets.musicbrainz.org/browse/MBS-357

2013-06-10 16138, 2013

15:15 PM
ijabz

So what is the question you wanted to ask me ?

2013-06-10 16140, 2013

15:15 PM
ianmcorvidae

ijabz: so what I'm wondering is if we can change the way the search server behaves when it gets that

2013-06-10 16116, 2013

15:16 PM
ianmcorvidae

specifically, it should finish up any pending requests at that time, and queue up any it gets from that time until it's been re-initialized with the new indexes

2013-06-10 16145, 2013

15:16 PM
ianmcorvidae

because at present when it gets that it apparently cuts off whatever it's doing right there -- so invalid JSON is getting sent upstream to the website search, which then causes an ISE

2013-06-10 16126, 2013

15:17 PM
ruaok

eek. that is a royal pain to deal with.

2013-06-10 16129, 2013

15:17 PM
ianmcorvidae

(I don't know if this also affects the webservice -- quite possibly it does)

2013-06-10 16134, 2013

15:17 PM
ruaok

I wish that we could remote control nginx.

2013-06-10 16147, 2013

15:17 PM
ruaok

nginx, please take dora out.

2013-06-10 16151, 2013

15:17 PM
ruaok

restart dora,

2013-06-10 16153, 2013

15:17 PM
ijabz

hmm, but searchservlet is stateless so when it recives that request it is unaware of any previous request sthat have not been completed

2013-06-10 16154, 2013

15:17 PM
ianmcorvidae

I have a quasi-hack that just does a retry from MBS, but ollie (correctly) wants us to do it right, so :)

2013-06-10 16155, 2013

15:17 PM
ruaok

nginx, dora back in.

2013-06-10 16159, 2013

15:17 PM
ruaok

that would be the best way to do this.

2013-06-10 16110, 2013

15:18 PM
ianmcorvidae

ruaok: I agree, but you're right that it's hard

2013-06-10 16138, 2013

15:18 PM
ianmcorvidae

ruaok: the way we could do that is to split out the upstream definition bits and write a bit of a shuffle-symlinks-then-HUP script

2013-06-10 16140, 2013

15:18 PM
ijabz

yep, rotating the two servers one by one make smore sense

2013-06-10 16153, 2013

15:18 PM
warp

warp has changed the topic to: Ponyta week / Agenda: reviews, ws barcode submissions (reotab)

2013-06-10 16107, 2013

15:19 PM
ruaok

or...

2013-06-10 16121, 2013

15:19 PM
ocharles

warp: we released

2013-06-10 16123, 2013

15:19 PM
ruaok

we could send the search server a signal to stop accepting new connections.

2013-06-10 16128, 2013

15:19 PM
ocharles

inline with what was in jira

2013-06-10 16141, 2013

15:19 PM
ruaok

which will cause nginx to send any requests to the other server.

2013-06-10 16155, 2013

15:19 PM
ianmcorvidae

ruaok: tell it to stop, wait 10 seconds, then re-init, you mean?

2013-06-10 16159, 2013

15:19 PM
ruaok

then after a second to let connections die down, Kill -9 the server for a restart.

2013-06-10 16101, 2013

15:20 PM
ianmcorvidae

(or whatever seconds)

2013-06-10 16107, 2013

15:20 PM
ruaok

yes.

2013-06-10 16117, 2013

15:20 PM
ruaok

except due to memory issues, we just kill -9 it

2013-06-10 16125, 2013

15:20 PM
ruaok

otherwise the memory does not get reclaimed.

2013-06-10 16127, 2013

15:20 PM
ocharles

ianmcorvidae: it's expecting to be merged to mbs-357

2013-06-10 16127, 2013

15:20 PM
mb-chat-logger

http://tickets.musicbrainz.org/browse/MBS-357

2013-06-10 16128, 2013

15:20 PM
ianmcorvidae

that still hits the statelessness problem potentially

2013-06-10 16130, 2013

15:20 PM
ocharles

so i'll do that, and then delete the branch :)

2013-06-10 16135, 2013

15:20 PM
ianmcorvidae

ocharles: k, cool :)

2013-06-10 16148, 2013

15:20 PM
warp

ocharles: great, can you approve https://bitbucket.org/metabrainz/musicbrainz-serv… so I can merge that as well? :)

2013-06-10 16152, 2013

15:20 PM
ruaok

ianmcorvidae: how does it still hit that problem?

2013-06-10 16102, 2013

15:21 PM
ocharles

surely search server has something that is handling connections coming?

2013-06-10 16108, 2013

15:21 PM
mb-chat-logger

http://tickets.musicbrainz.org/browse/MBS-6395

2013-06-10 16111, 2013

15:21 PM
ianmcorvidae

ruaok: well, either way you're talking about some sort of global setting toggle

2013-06-10 16120, 2013

15:21 PM
ocharles

so can't the init=mmap handler tell that take out a connection 'lock', wait for it to be granted, do its work, and then release the lock?

2013-06-10 16133, 2013

15:21 PM
ocharles

acquiring a lock would require all open requests to finish

2013-06-10 16133, 2013

15:21 PM
ianmcorvidae

ruaok: in my 'suggestion' I'm saying you do a toggle to a "queue" state, in yours to a "refuse" state

2013-06-10 16156, 2013

15:21 PM
ruaok

queue state is more trouble some to me.

2013-06-10 16101, 2013

15:22 PM
ijabz

The connections are handled by the servlet container itself jetty/tomcat not the code

2013-06-10 16106, 2013

15:22 PM
ruaok

now you're behind on requests that you could have another server deal with.

2013-06-10 16111, 2013

15:22 PM
ocharles

there are in fact already third party binaries that do exactly this

2013-06-10 16120, 2013

15:22 PM
ianmcorvidae

ruaok: that wasn't the problem ijabz mentioned though :P

2013-06-10 16135, 2013

15:22 PM
ocharles

ruaok: sure, but upstream should have timeouts

2013-06-10 16138, 2013

15:22 PM
ruaok

ianmcorvidae: thats why I mentioned it. :)

2013-06-10 16100, 2013

15:23 PM
ianmcorvidae

okay, well, you did it by saying that ijabz's problem stopped existing :P

2013-06-10 16101, 2013

15:23 PM
ocharles

it's not search servers responsibility to be terminating connections if it can ultimately serve them

2013-06-10 16109, 2013

15:23 PM
djce joined the channel

2013-06-10 16135, 2013

15:23 PM
ianmcorvidae

I think that we're getting into pointless weeds here and I should just bite the bullet and figure out making the loadbalancer do this how we want

2013-06-10 16136, 2013

15:23 PM
ruaok

this makes very litttle sense to me.

2013-06-10 16138, 2013

15:23 PM
ijabz

but i the code itself I could have it responding to a new command, so that all subsequent caommands simple cause the code to return a HttpError if thats what you want

2013-06-10 16103, 2013

15:24 PM
ocharles

how long does init=mmap take?

2013-06-10 16107, 2013

15:24 PM
ocharles

ballpark figure

2013-06-10 16109, 2013

15:24 PM
ianmcorvidae

hm, I wonder what an actual error would do (vs. failing connections)

2013-06-10 16111, 2013

15:24 PM
ruaok

a while.

2013-06-10 16123, 2013

15:24 PM
ianmcorvidae

(at the frontend, I mean)

2013-06-10 16124, 2013

15:24 PM
ruaok

ocharles: the load will spike to 15 for a few minutes while it loads all new data.

2013-06-10 16129, 2013

15:24 PM
ruaok

assuming that this is a new index.

2013-06-10 16135, 2013

15:24 PM
ocharles

if we can't queue connections, we should take it out of rotation and let the other server handle it

2013-06-10 16136, 2013

15:24 PM
ruaok

with a new index the caches are cold.

2013-06-10 16152, 2013

15:24 PM
ocharles

and if that's the case, then we need a way to coordinate nginx, as ruaok outlined earlier

2013-06-10 16155, 2013

15:24 PM
ruaok

queueing is very troublesome, when considered with why we do restarts.

2013-06-10 16156, 2013

15:24 PM
ianmcorvidae

really, it's that long? I guess it just refuses connections after a short period of time

2013-06-10 16104, 2013

15:25 PM
ianmcorvidae

we don't do restarts?

2013-06-10 16108, 2013

15:25 PM
ruaok

nope.

2013-06-10 16113, 2013

15:25 PM
ruaok

kill -9 is all we ever do.

2013-06-10 16117, 2013

15:25 PM
ianmcorvidae

no, I mean, you're asserting we do that

2013-06-10 16121, 2013

15:25 PM
ianmcorvidae

we don't, as far as I can see

2013-06-10 16128, 2013

15:25 PM
ianmcorvidae

we only do wget to ?init=mmap

2013-06-10 16141, 2013

15:25 PM
ianmcorvidae

(and then to ?rate=true, but)

2013-06-10 16155, 2013

15:26 PM
ianmcorvidae

I don't see a kill or a restart anywhere -- maybe it's hidden somewhere I hadn't found, and *that's* the real problem

2013-06-10 16123, 2013

15:27 PM
ianmcorvidae

a kill -9 would be pretty consistent with the whole "the JSON data stops randomly in the middle" phenomenon

2013-06-10 16114, 2013

15:28 PM
ianmcorvidae

... bah, that isn't actually a git repository, and the search servers have a different receive script than cartman's version

2013-06-10 16123, 2013

15:28 PM
ruaok

https://gist.github.com/mayhem/5749669

2013-06-10 16128, 2013

15:28 PM
ruaok

yes.

2013-06-10 16137, 2013

15:28 PM
ruaok

its quite messy.

2013-06-10 16153, 2013

15:28 PM
ruaok

another thing to maintain, too few people to maintain them.

2013-06-10 16156, 2013

15:28 PM
ocharles

ruaok: btw, '-t' is '-d' and '-u'

2013-06-10 16107, 2013

15:29 PM
ocharles

that sleep doesn't do anything either

2013-06-10 16111, 2013

15:29 PM
ruaok

-k would work fine

2013-06-10 16128, 2013

15:29 PM
ruaok

so, who wants to own the search servers?

2013-06-10 16143, 2013

15:29 PM
ruaok feels burt out

2013-06-10 16152, 2013

15:29 PM
ocharles

everyone should, it should be in fabfile.py, or some other bit of automated deployment

2013-06-10 16141, 2013

15:30 PM
ianmcorvidae

MBH-150

2013-06-10 16142, 2013

15:30 PM
mb-chat-logger

http://tickets.musicbrainz.org/browse/MBH-150

2013-06-10 16117, 2013

15:31 PM
ianmcorvidae

still doesn't really solve the actual problem, which I think the answer to is "bite the bullet and figure out how to automate the loadbalancers"

2013-06-10 16122, 2013

15:31 PM
ruaok unassigns himself

2013-06-10 16124, 2013

15:31 PM
ianmcorvidae

so I'll do that

2013-06-10 16152, 2013

15:31 PM
ianmcorvidae

and until such a time as we have a solution to MBH-150 I'll probably just add that to the makefiles

2013-06-10 16152, 2013

15:31 PM
mb-chat-logger

http://tickets.musicbrainz.org/browse/MBH-150

2013-06-10 16159, 2013

15:31 PM
ocharles

ianmcorvidae: I have a good idea on how to do that

2013-06-10 16103, 2013

15:32 PM
ocharles

so ping me before you work on that

2013-06-10 16110, 2013

15:32 PM
ianmcorvidae

ocharles: automating the loadbalancer?

2013-06-10 16112, 2013

15:32 PM
ocharles

yes

2013-06-10 16123, 2013

15:32 PM
ianmcorvidae

ocharles: I was figuring split out the upstreams section, have a few symlinks to shuffle between

2013-06-10 16132, 2013

15:32 PM
ocharles

yep, that's pretty much it

2013-06-10 16135, 2013

15:32 PM
MBJenkins

Project musicbrainz-server_beta build #475: STILL FAILING in 5 min 44 sec: http://ci.musicbrainz.org/job/musicbrainz-server_…

2013-06-10 16136, 2013

15:32 PM
MBJenkins

* warp: MBS-6395, add shell script to loop over update-medium-index.pl until all work is done.

2013-06-10 16137, 2013

15:32 PM
MBJenkins

* warp: MBS-6395, discs where not all tracks have a length should not have an entry in medium_index.

2013-06-10 16138, 2013

15:32 PM
MBJenkins

* warp: MBS-5958, also add an updateTrackNumbers call to resetTrackNumbers.

2013-06-10 16139, 2013

15:32 PM
MBJenkins

* warp: MBS-5903, include tests.

2013-06-10 16140, 2013

15:32 PM
MBJenkins

* warp: Revert "Revert "MBS-6416, keep track of track row ids when editing a medium.""

2013-06-10 16141, 2013

15:32 PM
ocharles

sounds like we're on the same page

2013-06-10 16141, 2013

15:32 PM
MBJenkins

* warp: MBS-6374, convert schema 16 style country/date pairs seeded to the release editor to schema 18 events.

2013-06-10 16142, 2013

15:32 PM
MBJenkins

* warp: MBS-6261, do not render "date unknown" in date/country pairs with unknown dates.

2013-06-10 16143, 2013

15:32 PM
MBJenkins

* warp: MBS-6261, change remaining "Release Events" columns to seperate date/country columns.

2013-06-10 16109, 2013

15:33 PM
ianmcorvidae

maybe a script to generate all the appropriate source files if I want to get real fancy :)

2013-06-10 16110, 2013

15:35 PM
ocharles

the more that can be automated, the better

2013-06-10 16154, 2013

15:36 PM
ocharles

warp: don't forget to set your tickets to in beta and set the fix version if necessary

2013-06-10 16126, 2013

15:37 PM
nikkiphone2 joined the channel

2013-06-10 16150, 2013

15:37 PM
djce joined the channel

2013-06-10 16100, 2013

15:39 PM
ianmcorvidae idly wonders if I can also make it do something for MBS like disable one server and return its name in some format fabric can use it, then when called again move to the next, etc.

2013-06-10 16112, 2013

15:39 PM
ianmcorvidae

so we can have one call to fab production

2013-06-10 16114, 2013

15:39 PM
MBJenkins

Project musicbrainz-server_beta build #476: STILL FAILING in 6 min 23 sec: http://ci.musicbrainz.org/job/musicbrainz-server_…

2013-06-10 16115, 2013

15:39 PM
MBJenkins

* warp: MBS-6101, guard c.session.tport appropriately (see previous commit).

2013-06-10 16116, 2013

15:39 PM
MBJenkins

* warp: MBS-6101, make Redis connection lazy.

2013-06-10 16117, 2013

15:39 PM
MBJenkins

* warp: MBS-6101, verify that Redis->new() selects the specified database.

2013-06-10 16118, 2013

15:40 PM
ocharles

ianmcorvidae: yea, moving to one command deploy is a definite goal

2013-06-10 16156, 2013

15:40 PM
ianmcorvidae

I should still add all our servers to /etc/hosts or something too :/ I guess that's the bit I still need to do production deployments

2013-06-10 16107, 2013

15:41 PM
ianmcorvidae

(or maybe to .ssh/config. not quite sure how that works)

2013-06-10 16125, 2013

15:41 PM
ocharles

production deployments needs .ssh/config

2013-06-10 16132, 2013

15:41 PM
ocharles

at least to do fab production -Hastro

2013-06-10 16142, 2013

15:41 PM
ianmcorvidae

ah, okay

2013-06-10 16143, 2013

15:41 PM
ianmcorvidae

cool

2013-06-10 16101, 2013

15:49 PM
ianmcorvidae

it's too bad SSH seems to lack an include mechanism for host files

2013-06-10 16120, 2013

15:49 PM
ianmcorvidae

or we could distribute one to people who have a VPN, to include in their own

2013-06-10 16114, 2013

15:50 PM
ocharles

mm

2013-06-10 16152, 2013

15:50 PM
ianmcorvidae

though I suppose fabric can probably take IP addresses as the host string (hopefully?) and then with my ostensible plan for a one-command deployment it can just return that instead of the name

2013-06-10 16150, 2013

15:51 PM
reosarevok joined the channel

2013-06-10 16147, 2013

15:56 PM
ocharles

yes, it can take anything ssh can take

2013-06-10 16154, 2013

15:56 PM
ocharles

within reason

2013-06-10 16142, 2013

16:05 PM
warp

DBD::Pg::st execute failed: ERROR: null value in column "ha1" violates not-null constraint at lib/Sql.pm line 107, <FILE> line 1.

2013-06-10 16150, 2013

16:05 PM
warp

that's related to the bcrypt changes I assume?

2013-06-10 16157, 2013

16:07 PM
ianmcorvidae

yeah

2013-06-10 16142, 2013

16:10 PM
warp

hrm. Unauthorized

2013-06-10 16138, 2013

16:12 PM
ijabz joined the channel

2013-06-10 16114, 2013

16:15 PM
warp hopes that fixes beta.

2013-06-10 16114, 2013

16:18 PM
outsidecontext joined the channel

2013-06-10 16133, 2013

16:25 PM
MBJenkins

warp: Insert editor into editor table with updated bcrypt columns (in WS::2::Collection test).

2013-06-10 16108, 2013

16:26 PM
warp

still failing ..

2013-06-10 16124, 2013

16:32 PM
outsidecontext_ joined the channel

2013-06-10 16156, 2013

16:44 PM
MBJenkins

Project musicbrainz-server_beta build #478: STILL FAILING in 9 min 53 sec: http://ci.musicbrainz.org/job/musicbrainz-server_…

2013-06-10 16157, 2013

16:44 PM
MBJenkins

warp: MBS-6101, fix bad merge.

2013-06-10 16120, 2013

16:45 PM
djce joined the channel

2013-06-10 16120, 2013

16:52 PM
Sophist_uk joined the channel

2013-06-10 16136, 2013

17:04 PM
nikki_ joined the channel