-
ruaok
without waiting for a decision from me?
-
ruaok grumbles
-
ocharles
sorry, it must have got caught up in the flurry of ship its warp gave me
-
it can still be reverted
-
ruaok
just do 10 push ups as a penalty.
-
ruaok snickers
-
ocharles
i run 2 miles a day, easy@
-
wait, I mean, yea, 10 pushups is a really bad penalty
-
ruaok
lol.
-
two beer penalty then. drink two beers less new time you're a the pub.
-
and buy me two beers next time you see me. :)
-
ocharles
:)
-
so i dunno what we do about about this replay script though
-
ruaok
whats the problem?
-
ocharles
i guess i need to do some more thorough analysis to see where it's failing and if it's only, say, /ws/1/release that fails most often
-
the problem is that over 80% of requests take longer on ngs.mb, than they do on the main servers
-
ruaok
based on how many calls?
-
ocharles
100
-
ruaok
for the most part, there isn't enough traffic to warm up the sites.
-
essentially ALL calls are cold.
-
ocharles
this is the same set of requests
-
it's the same even if I re run it when everything should be in cache
-
ruaok
that is if memcached was setup for the site.
-
which its not.
-
ocharles
ngs doesn't have memcached?
-
ruaok
not yet, not.
-
no
-
ocharles
ok, then that makes this script a bit more pointless, yes
-
ruaok
I have yet to find a machine that has a random 3-4GB of ram free for me to futz with.
-
well, I didn't say that we're ready for that yet.
-
ocharles
right
-
ruaok
you gonna be up for a while still?
-
I can fake it for now.
-
ocharles
mmm, yea, a few more hours at least
-
ruaok
ok.
-
let me get dora to be a memcached server.
-
then we can run your test again.
-
ocharles
even 256mb is probably enough to see some change
-
ruaok
4GB is a good start. :-)
-
hmmm.
-
# XXX remove this
-
# Cache::Memcached options
-
should that be removed now?
-
ocharles
hrm
-
ruaok
ok, ngs should now have a 4GB memcached available for its us.
-
use
-
we should see some speed ups coming now.
-
ocharles
i think stuff still probably uses that
-
ruaok: all services restarted too?
-
ruaok
fastcgi has been.
-
ocharles
ok
-
ruaok
I'll check memcached that its getting items
-
once you start the script.
-
ocharles
started
-
# Looks like you failed 77 tests of 97.
-
after the first run
-
ruaok
55/109
-
hits/misses
-
ocharles
shall I run again?
-
ruaok
go for it
-
hit rate is drastically improving, but thats not surprsing.
-
129/116
-
and?
-
ocharles
still running
-
1s pause between each test
-
but it doesn't look too good
-
the times look pretty close though
-
(in some cases)
-
ruaok
paste me some results.
-
ocharles
# Looks like you failed 73 tests of 97.
-
ruaok
what is consdiered a failure?
-
ocharles
-
if ngs takes more time than the main server
-
the numbers there are seconds
-
first number is ngs time, second number is main server time
-
ruaok
the track searches seem very slow.
-
but each of those was a 200 OK result?
-
ocharles
there is no check for that
-
i'll add that in
-
ruaok
I would love to separate these tests into tests of web vs search.
-
and something must be wrong, cold or misconfigured.
-
ngs on idle hardware is much slower than mason on hot hardware.
-
ocharles
running now with a check for 200
-
ruaok
k
-
ocharles
a test for "/ws/1/.*/?.*query=" should be enough to find searches, right?
-
of course a breakdown of times per end point would be better, but that will come tomorrow
-
ruaok nods
-
that reads to me that nearly all tests failed the duration check
-
but with limited stat output it's a bit hard to tell
-
i'll add that regex in to filter search queries out
-
Batsy joined the channel
-
ruaok
we're also only using 3 machines right now.
-
but they are mostly idle. should not really be an issue.
-
ocharles
# Looks like you failed 59 tests of 72. <-- for ws/1 requests that don't take an MBID
-
ruaok
not very encouraging, is it?
-
ocharles
nay :(
-
do I have access to fastcgi logs?
-
also it might be faster if we're not running ngs in debug mode
-
but I don't think that'll make a huge difference
-
ruaok
you should be able to access them.
-
let me turn off debug. one sec.
-
ok to restart?
-
ocharles
one sec
-
looking at: tail -f /usr/local/mb_server-fastcgi/log/main/current | tai64nlocal
-
on astro
-
the bulk of the time is in the lookup phase
-
# Looks like you failed 21 tests of 25. < for MBID requests
-
wait... are you sure debug mode is on? I shouldn't even see logging if that's on
-
CATALYST_DEBUG { 1 } in DBDefs, iirc
-
{ 0 } even :)
-
ruaok
its now 0.
-
but I haven't restarted.
-
ocharles
ah
-
restart
-
nah, no real difference
-
# Looks like you failed 24 tests of 25.
-
ruaok
and if you run it again?
-
ocharles
# Looks like you failed 20 tests of 25.
-
ruaok
I wonder if we need more indexes in the DB>
-
ocharles
that's where postgres logs come in
-
djce
What does the dependency stack look like for running the ngs replication exporter?
-
ocharles
but i think it's more that we need more caching
-
ruaok
djce: gimme one minute.
-
ocharles: should we drop the slow query threshold down a little?
-
ocharles
well there's no threshold atm, it's just "ngstime <= masontime" I don't expect that to always pass, but it should be failing very rarely
-
ruaok
ocharles: no on the slow query threshold in PG.
-
ocharles
oh
-
for the logs
-
yea, set that to 0
-
and clear the logs if you can, so the logs only have this (relevent) data
-
ruaok
k
-
ah.
-
there is one query that keeps coming up frequently.
-
ocharles
this sounds promising
-
ruaok
-
work on that one first.
-
that is the only thing showing up.
-
let me clear that.
-
ocharles
that looks like Release->get_by_recording
-
if you clear the log and I re run this I can put it through pgfouine to get a better view (and copy/pasta-able queries for explain)
-
ruaok
log cleared. old log saved.
-
ocharles
ok
-
test ran
-
ruaok
that query is back.
-
want the lgo?
-
-
ocharles
yep, please
-
that's the whole log?
-
ruaok
thats all it gave me.
-
ocharles
hrm