#musicbrainz-devel

/

      • djce joined the channel
      • 2011-10-01 27449, 2011

      • ijabz joined the channel
      • 2011-10-01 27456, 2011

      • lotuswrench joined the channel
      • 2011-10-01 27419, 2011

      • Leftmost joined the channel
      • 2011-10-01 27449, 2011

      • ijabz joined the channel
      • 2011-10-01 27418, 2011

      • muesli joined the channel
      • 2011-10-01 27419, 2011

      • muesli joined the channel
      • 2011-10-01 27446, 2011

      • muesli joined the channel
      • 2011-10-01 27448, 2011

      • ruaok joined the channel
      • 2011-10-01 27405, 2011

      • ruaok joined the channel
      • 2011-10-01 27401, 2011

      • kepstin joined the channel
      • 2011-10-01 27428, 2011

      • ijabz joined the channel
      • 2011-10-01 27457, 2011

      • nikki
        ocharles: ping
      • 2011-10-01 27422, 2011

      • ianmcorvidae joined the channel
      • 2011-10-01 27446, 2011

      • ijabz joined the channel
      • 2011-10-01 27408, 2011

      • djce joined the channel
      • 2011-10-01 27425, 2011

      • lfranchi joined the channel
      • 2011-10-01 27418, 2011

      • ijabz joined the channel
      • 2011-10-01 27403, 2011

      • kepstin joined the channel
      • 2011-10-01 27440, 2011

      • ijabz joined the channel
      • 2011-10-01 27400, 2011

      • ocharles
        nikki, ruaok : pong
      • 2011-10-01 27418, 2011

      • ruaok
        ocharles: my thinking now is that there isn't anything inherently wrong.
      • 2011-10-01 27424, 2011

      • ocharles
        oh?
      • 2011-10-01 27427, 2011

      • ruaok
        w
      • 2011-10-01 27439, 2011

      • ruaok
        which limits the number of open files to some very low number.
      • 2011-10-01 27447, 2011

      • ruaok
        I'm looking up how to fix this.
      • 2011-10-01 27457, 2011

      • ruaok
        I think going allow the full 20k files will fix this.
      • 2011-10-01 27415, 2011

      • ruaok
        it seems that all the file handles get properly reused over time.
      • 2011-10-01 27437, 2011

      • ruaok
        so, we probably just need to have more files open.
      • 2011-10-01 27455, 2011

      • ocharles
        you don't think we're going to hit the problem of lwp not closing connections?
      • 2011-10-01 27442, 2011

      • ruaok
        lap is not the culprit.
      • 2011-10-01 27445, 2011

      • ruaok
        feh.
      • 2011-10-01 27446, 2011

      • ruaok
        LWP.
      • 2011-10-01 27452, 2011

      • ruaok
        read your work email. :)
      • 2011-10-01 27454, 2011

      • ocharles has just returned from the pub :)
      • 2011-10-01 27402, 2011

      • ruaok
        no worries.
      • 2011-10-01 27404, 2011

      • ocharles
        will check, though maybe not as lucid as yesterday :)
      • 2011-10-01 27415, 2011

      • ruaok
        I'm just saying that I explained it in email.
      • 2011-10-01 27427, 2011

      • ruaok
        hobbes never had more than 80 some file handles open
      • 2011-10-01 27433, 2011

      • ruaok
        so its not a client side problem
      • 2011-10-01 27439, 2011

      • ocharles
        "I couldn't find any references to LWP::user agent keeping connections open" how does this correspond to what we saw yesterday? didn't we see lwp keeping connections open?
      • 2011-10-01 27412, 2011

      • ocharles
        and the server doesn't use mechanize, it uses LWP::UserAgent
      • 2011-10-01 27450, 2011

      • ocharles
        so i can't see how that is relevent to the actual code running. but yes, I saw too that hobbes handle count never grows
      • 2011-10-01 27409, 2011

      • ocharles
        sadly, if it's a search server issue, I'm useless there :(
      • 2011-10-01 27400, 2011

      • ruaok
        if hobbes never had more than 80 connections open, then it wasn't a client fault.
      • 2011-10-01 27427, 2011

      • ruaok
        and yes, ithe server doesn't use Mechanize, but I wanted to check to see if mechanize acted differently than LWP
      • 2011-10-01 27436, 2011

      • ruaok
        it does, but a little only.
      • 2011-10-01 27453, 2011

      • ruaok
        I really think its just givine tomcat more open files.
      • 2011-10-01 27456, 2011

      • ruaok
        ok, noms time.
      • 2011-10-01 27406, 2011

      • ocharles
      • 2011-10-01 27422, 2011

      • ocharles
        so that matches what I saw
      • 2011-10-01 27427, 2011

      • ocharles
        ruaok: however, www::mechanize does use lwp for its requests
      • 2011-10-01 27458, 2011

      • ocharles
        in fact, mechanize->get is just lwp::ua->get
      • 2011-10-01 27442, 2011

      • reosarevok joined the channel
      • 2011-10-01 27409, 2011

      • ruaok
        oh. I didn't know that.
      • 2011-10-01 27410, 2011

      • ruaok
        heh.
      • 2011-10-01 27451, 2011

      • ruaok
        well, if hobbes doesn't have the connections open, it *can't* be a client side issue.
      • 2011-10-01 27413, 2011

      • ruaok
        well, I have the ulimit command in place.
      • 2011-10-01 27428, 2011

      • ruaok
        I'll take dora out of rotation and then pummel dora to duplicate the problem.
      • 2011-10-01 27442, 2011

      • ruaok
        then turn on the ulimit and pummel it again.
      • 2011-10-01 27448, 2011

      • ruaok
        it shouldn't fail nearly as fast.
      • 2011-10-01 27403, 2011

      • ruaok
        right now its working with a max of 1024 open fds.
      • 2011-10-01 27426, 2011

      • ruaok goes to make dora sweat
      • 2011-10-01 27428, 2011

      • djce hopes these hacks are documented, repeatable and maintainable
      • 2011-10-01 27418, 2011

      • ruaok
        they are copied from the master.
      • 2011-10-01 27419, 2011

      • ruaok
        you!
      • 2011-10-01 27436, 2011

      • ruaok
        but once they are proven to work, I'll document, for sure.
      • 2011-10-01 27449, 2011

      • ruaok
        but its just a limit -n 20000 in the tomcat startup script.
      • 2011-10-01 27451, 2011

      • djce
        including /usr/local/java and mv'ing files owned by installed packages?
      • 2011-10-01 27403, 2011

      • djce
        that's the bit that worries me
      • 2011-10-01 27414, 2011

      • djce
        (the local / mv bit I mean)
      • 2011-10-01 27420, 2011

      • ruaok
        I will undo that since it didn't help.
      • 2011-10-01 27422, 2011

      • djce
        ok
      • 2011-10-01 27436, 2011

      • ruaok
        I'm just trying to fix this. :)
      • 2011-10-01 27446, 2011

      • djce
        I applied a limit of 10000 already and it didn't seem to help. Did I just need to aim higher?
      • 2011-10-01 27455, 2011

      • djce
        (in fact the limit is still 10000)
      • 2011-10-01 27431, 2011

      • ruaok
        where did you do that?
      • 2011-10-01 27438, 2011

      • djce
        dora's os-level limit is 10000 right now, as is roobarb's
      • 2011-10-01 27443, 2011

      • ruaok
        I know its done in cart man.
      • 2011-10-01 27450, 2011

      • ruaok
        right, but the per proc limit is 1024
      • 2011-10-01 27407, 2011

      • ruaok
        open files (-n) 1024
      • 2011-10-01 27409, 2011

      • djce
        hold on lemme past it in
      • 2011-10-01 27417, 2011

      • ruaok
        from ulimit -a
      • 2011-10-01 27453, 2011

      • djce
        0 S tomcat6 23015 1 99 80 0 - 303269 futex_ 17:31 ? 12:09:06 /usr/local/java/bin/java -Djava.util.logging.config.file=/var/lib/tom
      • 2011-10-01 27400, 2011

      • djce
        Max open files 10000 10000 files
      • 2011-10-01 27411, 2011

      • djce
        ulimit -n 10000
      • 2011-10-01 27415, 2011

      • djce
        same on dora.
      • 2011-10-01 27441, 2011

      • djce
        You're running with a limit no higher than 10000 (assuming tomcat only uses one process)
      • 2011-10-01 27455, 2011

      • djce
        and possibly lower, if the jvm applies a lower limit for some reason.
      • 2011-10-01 27454, 2011

      • ruaok
        ah.
      • 2011-10-01 27457, 2011

      • ruaok
        I didn't know that.
      • 2011-10-01 27458, 2011

      • djce
        so afaict, any idea that you're running with a 20000 limit is illusory.
      • 2011-10-01 27412, 2011

      • ruaok
        should be max 10k, i see now.
      • 2011-10-01 27451, 2011

      • ruaok
        ok, I undid both my changes on dora. ulimit removed, jvm dir restored.
      • 2011-10-01 27401, 2011

      • ruaok
        I'll do the same on roobarb
      • 2011-10-01 27439, 2011

      • ruaok
        right, back to square 1 then.
      • 2011-10-01 27451, 2011

      • ruaok
        time to check if the jvm has a lower limit
      • 2011-10-01 27442, 2011

      • djce
        We have a java guru on our team at work now. I could ask him on Monday if you like.
      • 2011-10-01 27452, 2011

      • ruaok
        please do.
      • 2011-10-01 27408, 2011

      • ruaok
        maybe you can shed some light on some of things I don't understand well.
      • 2011-10-01 27415, 2011

      • nikki
        ocharles: I was wondering when unused urls are going to be removed
      • 2011-10-01 27430, 2011

      • ruaok
        lsof shows a much lower number of files than netstat does.
      • 2011-10-01 27432, 2011

      • ocharles
        nikki: oh, that's still in review, but hopeully next release
      • 2011-10-01 27434, 2011

      • djce
        Could you email me (at my work address) with some specific questions to ask him? I'll pass them on.
      • 2011-10-01 27442, 2011

      • ruaok
        a lot of netstat entries are waiting to close the socket.
      • 2011-10-01 27454, 2011

      • djce
        Or, I could ask him to drop in here for 10 minutes and answer questions.
      • 2011-10-01 27412, 2011

      • ruaok
        what is the correct way to find out how many open sockets a file currently has.
      • 2011-10-01 27413, 2011

      • ocharles
        ruaok: that status means they are waiting for the client to close the socket though, not the client waiting for the server to close it
      • 2011-10-01 27418, 2011

      • ruaok
        djce: I would love that!
      • 2011-10-01 27450, 2011

      • djce
        Who would be most likely to want to talk to him?
      • 2011-10-01 27403, 2011

      • ruaok
        us. :)
      • 2011-10-01 27405, 2011

      • djce
        I've got some sort of generic Qs, but if there are more specific needs,
      • 2011-10-01 27406, 2011

      • ruaok
        me certainly.
      • 2011-10-01 27415, 2011

      • djce
        we could do with formulating more targeted questions
      • 2011-10-01 27424, 2011

      • djce
        so as not to use up too much favour-time :-)
      • 2011-10-01 27435, 2011

      • ruaok can appreciate that
      • 2011-10-01 27435, 2011

      • djce
        I'll see what I can do.
      • 2011-10-01 27438, 2011

      • ruaok
        ok.
      • 2011-10-01 27443, 2011

      • ruaok
        I think we're all ready to talk about this.
      • 2011-10-01 27454, 2011

      • ruaok
        between the 3 of us we can make this problem understood, methinks.
      • 2011-10-01 27439, 2011

      • djce has set a reminder to ask him on Monday morning
      • 2011-10-01 27455, 2011

      • ruaok
        thanks!
      • 2011-10-01 27443, 2011

      • ruaok wonders if this could be a keep alive problem?
      • 2011-10-01 27450, 2011

      • djce
        you know dora/roobarb are still running the /usr/local jvm, right?
      • 2011-10-01 27455, 2011

      • ruaok
        yes.
      • 2011-10-01 27417, 2011

      • ruaok
        I only moved the jvm from /usr/lib so that I was *sure* that it was using the new jre
      • 2011-10-01 27433, 2011

      • djce
        not sure about keepalive. Might have to check next time we catch the problem in progress.
      • 2011-10-01 27433, 2011

      • ruaok
        I will likely switch back to openjdk.
      • 2011-10-01 27401, 2011

      • djce
        I suspect it's not keepalive, since the sockets are closing, not ESTABLISHED
      • 2011-10-01 27404, 2011

      • ruaok
        I've set the keep alive on dora from the default 100 to 10.
      • 2011-10-01 27423, 2011

      • djce
        what setting is that?
      • 2011-10-01 27451, 2011

      • ruaok
        maxKeepAliveRequests="10"
      • 2011-10-01 27402, 2011

      • ruaok
        in server.xml in the HTTP connector section.
      • 2011-10-01 27406, 2011

      • djce
        ok
      • 2011-10-01 27428, 2011

      • djce
        I don't think nginx (which is the client) uses keepalive in client mode.
      • 2011-10-01 27435, 2011

      • ruaok
        I suspect that you're right.
      • 2011-10-01 27439, 2011

      • djce
        might be better if it did in fact.
      • 2011-10-01 27448, 2011

      • djce
        'cos there'd be less opening/closing going on.
      • 2011-10-01 27453, 2011

      • ruaok
        the growth seem to bot have changed.
      • 2011-10-01 27421, 2011

      • ruaok
        yeah, we've got a simple script that connects to dora directly and makes a shitload of requests.
      • 2011-10-01 27432, 2011

      • ruaok
        without going through nginx and the problem persists.
      • 2011-10-01 27451, 2011

      • djce
        shitload as in open/request/close, or open/req/req/req/req/req/.../close?
      • 2011-10-01 27435, 2011

      • ruaok
        the former
      • 2011-10-01 27441, 2011

      • ruaok
        simple LWP::UserAgent calls
      • 2011-10-01 27401, 2011

      • ruaok
        or even wgets from forked processes.
      • 2011-10-01 27442, 2011

      • ruaok
        3892 sockets in TIME_WAIT and 8 in ESTABLISHED.
      • 2011-10-01 27424, 2011

      • djce
        from what host?
      • 2011-10-01 27458, 2011

      • ruaok
        all from carl
      • 2011-10-01 27401, 2011

      • ruaok
        nxing
      • 2011-10-01 27423, 2011

      • ruaok