#musicbrainz-devel

/

      • mb-chat-logger
      • ijabz
        So what is the question you wanted to ask me ?
      • ianmcorvidae
        ijabz: so what I'm wondering is if we can change the way the search server behaves when it gets that
      • specifically, it should finish up any pending requests at that time, and queue up any it gets from that time until it's been re-initialized with the new indexes
      • because at present when it gets that it apparently cuts off whatever it's doing right there -- so invalid JSON is getting sent upstream to the website search, which then causes an ISE
      • ruaok
        eek. that is a royal pain to deal with.
      • ianmcorvidae
        (I don't know if this also affects the webservice -- quite possibly it does)
      • ruaok
        I wish that we could remote control nginx.
      • nginx, please take dora out.
      • restart dora,
      • ijabz
        hmm, but searchservlet is stateless so when it recives that request it is unaware of any previous request sthat have not been completed
      • ianmcorvidae
        I have a quasi-hack that just does a retry from MBS, but ollie (correctly) wants us to do it right, so :)
      • ruaok
        nginx, dora back in.
      • that would be the best way to do this.
      • ianmcorvidae
        ruaok: I agree, but you're right that it's hard
      • ruaok: the way we could do that is to split out the upstream definition bits and write a bit of a shuffle-symlinks-then-HUP script
      • ijabz
        yep, rotating the two servers one by one make smore sense
      • warp
        warp has changed the topic to: Ponyta week / Agenda: reviews, ws barcode submissions (reotab)
      • ruaok
        or...
      • ocharles
        warp: we released
      • ruaok
        we could send the search server a signal to stop accepting new connections.
      • ocharles
        inline with what was in jira
      • ruaok
        which will cause nginx to send any requests to the other server.
      • ianmcorvidae
        ruaok: tell it to stop, wait 10 seconds, then re-init, you mean?
      • ruaok
        then after a second to let connections die down, Kill -9 the server for a restart.
      • ianmcorvidae
        (or whatever seconds)
      • ruaok
        yes.
      • except due to memory issues, we just kill -9 it
      • otherwise the memory does not get reclaimed.
      • ocharles
        ianmcorvidae: it's expecting to be merged to mbs-357
      • mb-chat-logger
      • ianmcorvidae
        that still hits the statelessness problem potentially
      • ocharles
        so i'll do that, and then delete the branch :)
      • ianmcorvidae
        ocharles: k, cool :)
      • warp
        ocharles: great, can you approve https://bitbucket.org/metabrainz/musicbrainz-se... so I can merge that as well? :)
      • ruaok
        ianmcorvidae: how does it still hit that problem?
      • ocharles
        surely search server has something that is handling connections coming?
      • mb-chat-logger
      • ianmcorvidae
        ruaok: well, either way you're talking about some sort of global setting toggle
      • ocharles
        so can't the init=mmap handler tell that take out a connection 'lock', wait for it to be granted, do its work, and then release the lock?
      • acquiring a lock would require all open requests to finish
      • ianmcorvidae
        ruaok: in my 'suggestion' I'm saying you do a toggle to a "queue" state, in yours to a "refuse" state
      • ruaok
        queue state is more trouble some to me.
      • ijabz
        The connections are handled by the servlet container itself jetty/tomcat not the code
      • ruaok
        now you're behind on requests that you could have another server deal with.
      • ocharles
        there are in fact already third party binaries that do exactly this
      • ianmcorvidae
        ruaok: that wasn't the problem ijabz mentioned though :P
      • ocharles
        ruaok: sure, but upstream should have timeouts
      • ruaok
        ianmcorvidae: thats why I mentioned it. :)
      • ianmcorvidae
        okay, well, you did it by saying that ijabz's problem stopped existing :P
      • ocharles
        it's not search servers responsibility to be terminating connections if it can ultimately serve them
      • djce joined the channel
      • ianmcorvidae
        I think that we're getting into pointless weeds here and I should just bite the bullet and figure out making the loadbalancer do this how we want
      • ruaok
        this makes very litttle sense to me.
      • ijabz
        but i the code itself I could have it responding to a new command, so that all subsequent caommands simple cause the code to return a HttpError if thats what you want
      • ocharles
        how long does init=mmap take?
      • ballpark figure
      • ianmcorvidae
        hm, I wonder what an actual error would do (vs. failing connections)
      • ruaok
        a while.
      • ianmcorvidae
        (at the frontend, I mean)
      • ruaok
        ocharles: the load will spike to 15 for a few minutes while it loads all new data.
      • assuming that this is a new index.
      • ocharles
        if we can't queue connections, we should take it out of rotation and let the other server handle it
      • ruaok
        with a new index the caches are cold.
      • ocharles
        and if that's the case, then we need a way to coordinate nginx, as ruaok outlined earlier
      • ruaok
        queueing is very troublesome, when considered with why we do restarts.
      • ianmcorvidae
        really, it's that long? I guess it just refuses connections after a short period of time
      • we don't do restarts?
      • ruaok
        nope.
      • kill -9 is all we ever do.
      • ianmcorvidae
        no, I mean, you're asserting we do that
      • we don't, as far as I can see
      • we only do wget to ?init=mmap
      • (and then to ?rate=true, but)
      • I don't see a kill or a restart anywhere -- maybe it's hidden somewhere I hadn't found, and *that's* the real problem
      • a kill -9 would be pretty consistent with the whole "the JSON data stops randomly in the middle" phenomenon
      • ... bah, that isn't actually a git repository, and the search servers have a different receive script than cartman's version
      • ruaok
      • yes.
      • its quite messy.
      • another thing to maintain, too few people to maintain them.
      • ocharles
        ruaok: btw, '-t' is '-d' and '-u'
      • that sleep doesn't do anything either
      • ruaok
        -k would work fine
      • so, who wants to own the search servers?
      • ruaok feels burt out
      • ocharles
        everyone should, it should be in fabfile.py, or some other bit of automated deployment
      • ianmcorvidae
        MBH-150
      • mb-chat-logger
      • ianmcorvidae
        still doesn't really solve the actual problem, which I think the answer to is "bite the bullet and figure out how to automate the loadbalancers"
      • ruaok unassigns himself
      • so I'll do that
      • and until such a time as we have a solution to MBH-150 I'll probably just add that to the makefiles
      • mb-chat-logger
      • ocharles
        ianmcorvidae: I have a good idea on how to do that
      • so ping me before you work on that
      • ianmcorvidae
        ocharles: automating the loadbalancer?
      • ocharles
        yes
      • ianmcorvidae
        ocharles: I was figuring split out the upstreams section, have a few symlinks to shuffle between
      • ocharles
        yep, that's pretty much it
      • MBJenkins
        Project musicbrainz-server_beta build #475: STILL FAILING in 5 min 44 sec: http://ci.musicbrainz.org/job/musicbrainz-serve...
      • * warp: MBS-6395, add shell script to loop over update-medium-index.pl until all work is done.
      • * warp: MBS-6395, discs where not all tracks have a length should not have an entry in medium_index.
      • * warp: MBS-5958, also add an updateTrackNumbers call to resetTrackNumbers.
      • * warp: MBS-5903, include tests.
      • * warp: Revert "Revert "MBS-6416, keep track of track row ids when editing a medium.""
      • ocharles
        sounds like we're on the same page
      • MBJenkins
        * warp: MBS-6374, convert schema 16 style country/date pairs seeded to the release editor to schema 18 events.
      • * warp: MBS-6261, do not render "date unknown" in date/country pairs with unknown dates.
      • * warp: MBS-6261, change remaining "Release Events" columns to seperate date/country columns.
      • ianmcorvidae
        maybe a script to generate all the appropriate source files if I want to get real fancy :)
      • ocharles
        the more that can be automated, the better
      • warp: don't forget to set your tickets to in beta and set the fix version if necessary
      • nikkiphone2 joined the channel
      • djce joined the channel
      • ianmcorvidae idly wonders if I can also make it do something for MBS like disable one server and return its name in some format fabric can use it, then when called again move to the next, etc.
      • ianmcorvidae
        so we can have one call to fab production
      • MBJenkins
        Project musicbrainz-server_beta build #476: STILL FAILING in 6 min 23 sec: http://ci.musicbrainz.org/job/musicbrainz-serve...
      • * warp: MBS-6101, guard c.session.tport appropriately (see previous commit).
      • * warp: MBS-6101, make Redis connection lazy.
      • * warp: MBS-6101, verify that Redis->new() selects the specified database.
      • ocharles
        ianmcorvidae: yea, moving to one command deploy is a definite goal
      • ianmcorvidae
        I should still add all our servers to /etc/hosts or something too :/ I guess that's the bit I still need to do production deployments
      • (or maybe to .ssh/config. not quite sure how that works)
      • ocharles
        production deployments needs .ssh/config
      • at least to do fab production -Hastro
      • ianmcorvidae
        ah, okay
      • cool
      • it's too bad SSH seems to lack an include mechanism for host files
      • or we could distribute one to people who have a VPN, to include in their own
      • ocharles
        mm
      • ianmcorvidae
        though I suppose fabric can probably take IP addresses as the host string (hopefully?) and then with my ostensible plan for a one-command deployment it can just return that instead of the name
      • reosarevok joined the channel
      • ocharles
        yes, it can take anything ssh can take
      • within reason
      • warp
        DBD::Pg::st execute failed: ERROR: null value in column "ha1" violates not-null constraint at lib/Sql.pm line 107, <FILE> line 1.
      • that's related to the bcrypt changes I assume?
      • ianmcorvidae
        yeah
      • warp
        hrm. Unauthorized
      • ijabz joined the channel
      • warp hopes that fixes beta.
      • outsidecontext joined the channel
      • MBJenkins
        warp: Insert editor into editor table with updated bcrypt columns (in WS::2::Collection test).
      • warp
        still failing ..
      • outsidecontext_ joined the channel
      • MBJenkins
        Project musicbrainz-server_beta build #478: STILL FAILING in 9 min 53 sec: http://ci.musicbrainz.org/job/musicbrainz-serve...
      • warp: MBS-6101, fix bad merge.
      • djce joined the channel
      • Sophist_uk joined the channel
      • nikki_ joined the channel