6:40 AM
djce joined the channel
2011-10-01 27449, 2011
6:51 AM
ijabz joined the channel
2011-10-01 27456, 2011
7:28 AM
lotuswrench joined the channel
2011-10-01 27419, 2011
7:30 AM
Leftmost joined the channel
2011-10-01 27449, 2011
7:32 AM
ijabz joined the channel
2011-10-01 27418, 2011
13:15 PM
muesli joined the channel
2011-10-01 27419, 2011
13:19 PM
muesli joined the channel
2011-10-01 27446, 2011
13:20 PM
muesli joined the channel
2011-10-01 27448, 2011
14:05 PM
ruaok joined the channel
2011-10-01 27405, 2011
14:06 PM
ruaok joined the channel
2011-10-01 27401, 2011
14:11 PM
kepstin joined the channel
2011-10-01 27428, 2011
14:28 PM
ijabz joined the channel
2011-10-01 27457, 2011
15:57 PM
nikki
ocharles: ping
2011-10-01 27422, 2011
16:19 PM
ianmcorvidae joined the channel
2011-10-01 27446, 2011
18:58 PM
ijabz joined the channel
2011-10-01 27408, 2011
19:37 PM
djce joined the channel
2011-10-01 27425, 2011
19:37 PM
lfranchi joined the channel
2011-10-01 27418, 2011
19:39 PM
ijabz joined the channel
2011-10-01 27403, 2011
19:42 PM
kepstin joined the channel
2011-10-01 27440, 2011
19:52 PM
ijabz joined the channel
2011-10-01 27400, 2011
19:59 PM
ocharles
nikki, ruaok : pong
2011-10-01 27418, 2011
19:59 PM
ruaok
ocharles: my thinking now is that there isn't anything inherently wrong.
2011-10-01 27424, 2011
19:59 PM
ocharles
oh?
2011-10-01 27427, 2011
19:59 PM
ruaok
w
2011-10-01 27439, 2011
19:59 PM
ruaok
which limits the number of open files to some very low number.
2011-10-01 27447, 2011
19:59 PM
ruaok
I'm looking up how to fix this.
2011-10-01 27457, 2011
19:59 PM
ruaok
I think going allow the full 20k files will fix this.
2011-10-01 27415, 2011
20:00 PM
ruaok
it seems that all the file handles get properly reused over time.
2011-10-01 27437, 2011
20:00 PM
ruaok
so, we probably just need to have more files open.
2011-10-01 27455, 2011
20:00 PM
ocharles
you don't think we're going to hit the problem of lwp not closing connections?
2011-10-01 27442, 2011
20:02 PM
ruaok
lap is not the culprit.
2011-10-01 27445, 2011
20:02 PM
ruaok
feh.
2011-10-01 27446, 2011
20:02 PM
ruaok
LWP.
2011-10-01 27452, 2011
20:02 PM
ruaok
read your work email. :)
2011-10-01 27454, 2011
20:04 PM
ocharles has just returned from the pub :)
2011-10-01 27402, 2011
20:05 PM
ruaok
no worries.
2011-10-01 27404, 2011
20:05 PM
ocharles
will check, though maybe not as lucid as yesterday :)
2011-10-01 27415, 2011
20:05 PM
ruaok
I'm just saying that I explained it in email.
2011-10-01 27427, 2011
20:05 PM
ruaok
hobbes never had more than 80 some file handles open
2011-10-01 27433, 2011
20:05 PM
ruaok
so its not a client side problem
2011-10-01 27439, 2011
20:06 PM
ocharles
"I couldn't find any references to LWP::user agent keeping connections open" how does this correspond to what we saw yesterday? didn't we see lwp keeping connections open?
2011-10-01 27412, 2011
20:07 PM
ocharles
and the server doesn't use mechanize, it uses LWP::UserAgent
2011-10-01 27450, 2011
20:07 PM
ocharles
so i can't see how that is relevent to the actual code running. but yes, I saw too that hobbes handle count never grows
2011-10-01 27409, 2011
20:08 PM
ocharles
sadly, if it's a search server issue, I'm useless there :(
2011-10-01 27400, 2011
20:09 PM
ruaok
if hobbes never had more than 80 connections open, then it wasn't a client fault.
2011-10-01 27427, 2011
20:09 PM
ruaok
and yes, ithe server doesn't use Mechanize, but I wanted to check to see if mechanize acted differently than LWP
2011-10-01 27436, 2011
20:09 PM
ruaok
it does, but a little only.
2011-10-01 27453, 2011
20:09 PM
ruaok
I really think its just givine tomcat more open files.
2011-10-01 27456, 2011
20:09 PM
ruaok
ok, noms time.
2011-10-01 27406, 2011
20:10 PM
ocharles
2011-10-01 27422, 2011
20:10 PM
ocharles
so that matches what I saw
2011-10-01 27427, 2011
20:13 PM
ocharles
ruaok: however, www::mechanize does use lwp for its requests
2011-10-01 27458, 2011
20:13 PM
ocharles
in fact, mechanize->get is just lwp::ua->get
2011-10-01 27442, 2011
20:18 PM
reosarevok joined the channel
2011-10-01 27409, 2011
20:23 PM
ruaok
oh. I didn't know that.
2011-10-01 27410, 2011
20:23 PM
ruaok
heh.
2011-10-01 27451, 2011
20:23 PM
ruaok
well, if hobbes doesn't have the connections open, it *can't* be a client side issue.
2011-10-01 27413, 2011
20:24 PM
ruaok
well, I have the ulimit command in place.
2011-10-01 27428, 2011
20:24 PM
ruaok
I'll take dora out of rotation and then pummel dora to duplicate the problem.
2011-10-01 27442, 2011
20:24 PM
ruaok
then turn on the ulimit and pummel it again.
2011-10-01 27448, 2011
20:24 PM
ruaok
it shouldn't fail nearly as fast.
2011-10-01 27403, 2011
20:25 PM
ruaok
right now its working with a max of 1024 open fds.
2011-10-01 27426, 2011
20:29 PM
ruaok goes to make dora sweat
2011-10-01 27428, 2011
20:35 PM
djce hopes these hacks are documented, repeatable and maintainable
2011-10-01 27418, 2011
20:38 PM
ruaok
they are copied from the master.
2011-10-01 27419, 2011
20:38 PM
ruaok
you!
2011-10-01 27436, 2011
20:38 PM
ruaok
but once they are proven to work, I'll document, for sure.
2011-10-01 27449, 2011
20:38 PM
ruaok
but its just a limit -n 20000 in the tomcat startup script.
2011-10-01 27451, 2011
20:38 PM
djce
including /usr/local/java and mv'ing files owned by installed packages?
2011-10-01 27403, 2011
20:39 PM
djce
that's the bit that worries me
2011-10-01 27414, 2011
20:39 PM
djce
(the local / mv bit I mean)
2011-10-01 27420, 2011
20:39 PM
ruaok
I will undo that since it didn't help.
2011-10-01 27422, 2011
20:39 PM
djce
ok
2011-10-01 27436, 2011
20:39 PM
ruaok
I'm just trying to fix this. :)
2011-10-01 27446, 2011
20:39 PM
djce
I applied a limit of 10000 already and it didn't seem to help. Did I just need to aim higher?
2011-10-01 27455, 2011
20:39 PM
djce
(in fact the limit is still 10000)
2011-10-01 27431, 2011
20:40 PM
ruaok
where did you do that?
2011-10-01 27438, 2011
20:40 PM
djce
dora's os-level limit is 10000 right now, as is roobarb's
2011-10-01 27443, 2011
20:40 PM
ruaok
I know its done in cart man.
2011-10-01 27450, 2011
20:40 PM
ruaok
right, but the per proc limit is 1024
2011-10-01 27407, 2011
20:41 PM
ruaok
open files (-n) 1024
2011-10-01 27409, 2011
20:41 PM
djce
hold on lemme past it in
2011-10-01 27417, 2011
20:41 PM
ruaok
from ulimit -a
2011-10-01 27453, 2011
20:41 PM
djce
0 S tomcat6 23015 1 99 80 0 - 303269 futex_ 17:31 ? 12:09:06 /usr/local/java/bin/java -Djava.util.logging.config.file=/var/lib/tom
2011-10-01 27400, 2011
20:42 PM
djce
Max open files 10000 10000 files
2011-10-01 27411, 2011
20:42 PM
djce
ulimit -n 10000
2011-10-01 27415, 2011
20:42 PM
djce
same on dora.
2011-10-01 27441, 2011
20:42 PM
djce
You're running with a limit no higher than 10000 (assuming tomcat only uses one process)
2011-10-01 27455, 2011
20:42 PM
djce
and possibly lower, if the jvm applies a lower limit for some reason.
2011-10-01 27454, 2011
20:43 PM
ruaok
ah.
2011-10-01 27457, 2011
20:43 PM
ruaok
I didn't know that.
2011-10-01 27458, 2011
20:43 PM
djce
so afaict, any idea that you're running with a 20000 limit is illusory.
2011-10-01 27412, 2011
20:44 PM
ruaok
should be max 10k, i see now.
2011-10-01 27451, 2011
20:46 PM
ruaok
ok, I undid both my changes on dora. ulimit removed, jvm dir restored.
2011-10-01 27401, 2011
20:47 PM
ruaok
I'll do the same on roobarb
2011-10-01 27439, 2011
20:47 PM
ruaok
right, back to square 1 then.
2011-10-01 27451, 2011
20:47 PM
ruaok
time to check if the jvm has a lower limit
2011-10-01 27442, 2011
20:50 PM
djce
We have a java guru on our team at work now. I could ask him on Monday if you like.
2011-10-01 27452, 2011
20:50 PM
ruaok
please do.
2011-10-01 27408, 2011
20:51 PM
ruaok
maybe you can shed some light on some of things I don't understand well.
2011-10-01 27415, 2011
20:51 PM
nikki
ocharles: I was wondering when unused urls are going to be removed
2011-10-01 27430, 2011
20:51 PM
ruaok
lsof shows a much lower number of files than netstat does.
2011-10-01 27432, 2011
20:51 PM
ocharles
nikki: oh, that's still in review, but hopeully next release
2011-10-01 27434, 2011
20:51 PM
djce
Could you email me (at my work address) with some specific questions to ask him? I'll pass them on.
2011-10-01 27442, 2011
20:51 PM
ruaok
a lot of netstat entries are waiting to close the socket.
2011-10-01 27454, 2011
20:51 PM
djce
Or, I could ask him to drop in here for 10 minutes and answer questions.
2011-10-01 27412, 2011
20:52 PM
ruaok
what is the correct way to find out how many open sockets a file currently has.
2011-10-01 27413, 2011
20:52 PM
ocharles
ruaok: that status means they are waiting for the client to close the socket though, not the client waiting for the server to close it
2011-10-01 27418, 2011
20:52 PM
ruaok
djce: I would love that!
2011-10-01 27450, 2011
20:52 PM
djce
Who would be most likely to want to talk to him?
2011-10-01 27403, 2011
20:53 PM
ruaok
us. :)
2011-10-01 27405, 2011
20:53 PM
djce
I've got some sort of generic Qs, but if there are more specific needs,
2011-10-01 27406, 2011
20:53 PM
ruaok
me certainly.
2011-10-01 27415, 2011
20:53 PM
djce
we could do with formulating more targeted questions
2011-10-01 27424, 2011
20:53 PM
djce
so as not to use up too much favour-time :-)
2011-10-01 27435, 2011
20:53 PM
ruaok can appreciate that
2011-10-01 27435, 2011
20:53 PM
djce
I'll see what I can do.
2011-10-01 27438, 2011
20:53 PM
ruaok
ok.
2011-10-01 27443, 2011
20:53 PM
ruaok
I think we're all ready to talk about this.
2011-10-01 27454, 2011
20:53 PM
ruaok
between the 3 of us we can make this problem understood, methinks.
2011-10-01 27439, 2011
20:55 PM
djce has set a reminder to ask him on Monday morning
2011-10-01 27455, 2011
20:56 PM
ruaok
thanks!
2011-10-01 27443, 2011
20:57 PM
ruaok wonders if this could be a keep alive problem?
2011-10-01 27450, 2011
21:01 PM
djce
you know dora/roobarb are still running the /usr/local jvm, right?
2011-10-01 27455, 2011
21:02 PM
ruaok
yes.
2011-10-01 27417, 2011
21:03 PM
ruaok
I only moved the jvm from /usr/lib so that I was *sure* that it was using the new jre
2011-10-01 27433, 2011
21:03 PM
djce
not sure about keepalive. Might have to check next time we catch the problem in progress.
2011-10-01 27433, 2011
21:03 PM
ruaok
I will likely switch back to openjdk.
2011-10-01 27401, 2011
21:04 PM
djce
I suspect it's not keepalive, since the sockets are closing, not ESTABLISHED
2011-10-01 27404, 2011
21:04 PM
ruaok
I've set the keep alive on dora from the default 100 to 10.
2011-10-01 27423, 2011
21:04 PM
djce
what setting is that?
2011-10-01 27451, 2011
21:04 PM
ruaok
maxKeepAliveRequests="10"
2011-10-01 27402, 2011
21:05 PM
ruaok
in server.xml in the HTTP connector section.
2011-10-01 27406, 2011
21:05 PM
djce
ok
2011-10-01 27428, 2011
21:05 PM
djce
I don't think nginx (which is the client) uses keepalive in client mode.
2011-10-01 27435, 2011
21:05 PM
ruaok
I suspect that you're right.
2011-10-01 27439, 2011
21:05 PM
djce
might be better if it did in fact.
2011-10-01 27448, 2011
21:05 PM
djce
'cos there'd be less opening/closing going on.
2011-10-01 27453, 2011
21:05 PM
ruaok
the growth seem to bot have changed.
2011-10-01 27421, 2011
21:06 PM
ruaok
yeah, we've got a simple script that connects to dora directly and makes a shitload of requests.
2011-10-01 27432, 2011
21:06 PM
ruaok
without going through nginx and the problem persists.
2011-10-01 27451, 2011
21:06 PM
djce
shitload as in open/request/close, or open/req/req/req/req/req/.../close?
2011-10-01 27435, 2011
21:07 PM
ruaok
the former
2011-10-01 27441, 2011
21:07 PM
ruaok
simple LWP::UserAgent calls
2011-10-01 27401, 2011
21:08 PM
ruaok
or even wgets from forked processes.
2011-10-01 27442, 2011
21:08 PM
ruaok
3892 sockets in TIME_WAIT and 8 in ESTABLISHED.
2011-10-01 27424, 2011
21:09 PM
djce
from what host?
2011-10-01 27458, 2011
21:09 PM
ruaok
all from carl
2011-10-01 27401, 2011
21:10 PM
ruaok
nxing
2011-10-01 27423, 2011
21:10 PM
ruaok