-
ruaok
without waiting for a decision from me?
2011-04-12 10241, 2011
-
ruaok grumbles
2011-04-12 10214, 2011
-
ocharles
sorry, it must have got caught up in the flurry of ship its warp gave me
2011-04-12 10218, 2011
-
ocharles
it can still be reverted
2011-04-12 10245, 2011
-
ruaok
just do 10 push ups as a penalty.
2011-04-12 10256, 2011
-
ruaok snickers
2011-04-12 10205, 2011
-
ocharles
i run 2 miles a day, easy@
2011-04-12 10216, 2011
-
ocharles
wait, I mean, yea, 10 pushups is a really bad penalty
2011-04-12 10225, 2011
-
ruaok
lol.
2011-04-12 10244, 2011
-
ruaok
two beer penalty then. drink two beers less new time you're a the pub.
2011-04-12 10251, 2011
-
ruaok
and buy me two beers next time you see me. :)
2011-04-12 10254, 2011
-
ocharles
:)
2011-04-12 10201, 2011
-
ocharles
so i dunno what we do about about this replay script though
2011-04-12 10219, 2011
-
ruaok
whats the problem?
2011-04-12 10222, 2011
-
ocharles
i guess i need to do some more thorough analysis to see where it's failing and if it's only, say, /ws/1/release that fails most often
2011-04-12 10235, 2011
-
ocharles
the problem is that over 80% of requests take longer on ngs.mb, than they do on the main servers
2011-04-12 10258, 2011
-
ruaok
based on how many calls?
2011-04-12 10210, 2011
-
ocharles
100
2011-04-12 10216, 2011
-
ruaok
for the most part, there isn't enough traffic to warm up the sites.
2011-04-12 10224, 2011
-
ruaok
essentially ALL calls are cold.
2011-04-12 10229, 2011
-
ocharles
this is the same set of requests
2011-04-12 10236, 2011
-
ocharles
it's the same even if I re run it when everything should be in cache
2011-04-12 10255, 2011
-
ruaok
that is if memcached was setup for the site.
2011-04-12 10258, 2011
-
ruaok
which its not.
2011-04-12 10207, 2011
-
ocharles
ngs doesn't have memcached?
2011-04-12 10212, 2011
-
ruaok
not yet, not.
2011-04-12 10213, 2011
-
ruaok
no
2011-04-12 10229, 2011
-
ocharles
ok, then that makes this script a bit more pointless, yes
2011-04-12 10229, 2011
-
ruaok
I have yet to find a machine that has a random 3-4GB of ram free for me to futz with.
2011-04-12 10243, 2011
-
ruaok
well, I didn't say that we're ready for that yet.
2011-04-12 10249, 2011
-
ocharles
right
2011-04-12 10255, 2011
-
ruaok
you gonna be up for a while still?
2011-04-12 10201, 2011
-
ruaok
I can fake it for now.
2011-04-12 10203, 2011
-
ocharles
mmm, yea, a few more hours at least
2011-04-12 10205, 2011
-
ruaok
ok.
2011-04-12 10213, 2011
-
ruaok
let me get dora to be a memcached server.
2011-04-12 10217, 2011
-
ruaok
then we can run your test again.
2011-04-12 10224, 2011
-
ocharles
even 256mb is probably enough to see some change
2011-04-12 10215, 2011
-
ruaok
4GB is a good start. :-)
2011-04-12 10258, 2011
-
ruaok
hmmm.
2011-04-12 10215, 2011
-
ruaok
# XXX remove this
2011-04-12 10216, 2011
-
ruaok
# Cache::Memcached options
2011-04-12 10221, 2011
-
ruaok
should that be removed now?
2011-04-12 10256, 2011
-
ocharles
hrm
2011-04-12 10259, 2011
-
ruaok
ok, ngs should now have a 4GB memcached available for its us.
2011-04-12 10200, 2011
-
ruaok
use
2011-04-12 10209, 2011
-
ruaok
we should see some speed ups coming now.
2011-04-12 10210, 2011
-
ocharles
i think stuff still probably uses that
2011-04-12 10217, 2011
-
ocharles
ruaok: all services restarted too?
2011-04-12 10245, 2011
-
ruaok
fastcgi has been.
2011-04-12 10252, 2011
-
ocharles
ok
2011-04-12 10255, 2011
-
ruaok
I'll check memcached that its getting items
2011-04-12 10259, 2011
-
ruaok
once you start the script.
2011-04-12 10202, 2011
-
ocharles
started
2011-04-12 10234, 2011
-
ocharles
# Looks like you failed 77 tests of 97.
2011-04-12 10235, 2011
-
ocharles
after the first run
2011-04-12 10236, 2011
-
ruaok
55/109
2011-04-12 10240, 2011
-
ruaok
hits/misses
2011-04-12 10240, 2011
-
ocharles
shall I run again?
2011-04-12 10245, 2011
-
ruaok
go for it
2011-04-12 10231, 2011
-
ruaok
hit rate is drastically improving, but thats not surprsing.
2011-04-12 10201, 2011
-
ruaok
129/116
2011-04-12 10230, 2011
-
ruaok
and?
2011-04-12 10233, 2011
-
ocharles
still running
2011-04-12 10242, 2011
-
ocharles
1s pause between each test
2011-04-12 10254, 2011
-
ocharles
but it doesn't look too good
2011-04-12 10208, 2011
-
ocharles
the times look pretty close though
2011-04-12 10215, 2011
-
ocharles
(in some cases)
2011-04-12 10219, 2011
-
ruaok
paste me some results.
2011-04-12 10222, 2011
-
ocharles
# Looks like you failed 73 tests of 97.
2011-04-12 10243, 2011
-
ruaok
what is consdiered a failure?
2011-04-12 10252, 2011
-
ocharles
2011-04-12 10259, 2011
-
ocharles
if ngs takes more time than the main server
2011-04-12 10205, 2011
-
ocharles
the numbers there are seconds
2011-04-12 10215, 2011
-
ocharles
first number is ngs time, second number is main server time
2011-04-12 10247, 2011
-
ruaok
the track searches seem very slow.
2011-04-12 10217, 2011
-
ruaok
but each of those was a 200 OK result?
2011-04-12 10251, 2011
-
ocharles
there is no check for that
2011-04-12 10253, 2011
-
ocharles
i'll add that in
2011-04-12 10257, 2011
-
ruaok
I would love to separate these tests into tests of web vs search.
2011-04-12 10223, 2011
-
ruaok
and something must be wrong, cold or misconfigured.
2011-04-12 10240, 2011
-
ruaok
ngs on idle hardware is much slower than mason on hot hardware.
2011-04-12 10232, 2011
-
ocharles
running now with a check for 200
2011-04-12 10246, 2011
-
ruaok
k
2011-04-12 10234, 2011
-
ocharles
a test for "/ws/1/.*/?.*query=" should be enough to find searches, right?
2011-04-12 10259, 2011
-
ocharles
of course a breakdown of times per end point would be better, but that will come tomorrow
2011-04-12 10219, 2011
-
ruaok nods
2011-04-12 10224, 2011
-
ocharles
that reads to me that nearly all tests failed the duration check
2011-04-12 10238, 2011
-
ocharles
but with limited stat output it's a bit hard to tell
2011-04-12 10246, 2011
-
ocharles
i'll add that regex in to filter search queries out
2011-04-12 10254, 2011
-
Batsy joined the channel
2011-04-12 10237, 2011
-
ruaok
we're also only using 3 machines right now.
2011-04-12 10248, 2011
-
ruaok
but they are mostly idle. should not really be an issue.
2011-04-12 10244, 2011
-
ocharles
# Looks like you failed 59 tests of 72. <-- for ws/1 requests that don't take an MBID
2011-04-12 10228, 2011
-
ruaok
not very encouraging, is it?
2011-04-12 10258, 2011
-
ocharles
nay :(
2011-04-12 10230, 2011
-
ocharles
do I have access to fastcgi logs?
2011-04-12 10247, 2011
-
ocharles
also it might be faster if we're not running ngs in debug mode
2011-04-12 10253, 2011
-
ocharles
but I don't think that'll make a huge difference
2011-04-12 10218, 2011
-
ruaok
you should be able to access them.
2011-04-12 10223, 2011
-
ruaok
let me turn off debug. one sec.
2011-04-12 10259, 2011
-
ruaok
ok to restart?
2011-04-12 10247, 2011
-
ocharles
one sec
2011-04-12 10209, 2011
-
ocharles
looking at: tail -f /usr/local/mb_server-fastcgi/log/main/current | tai64nlocal
2011-04-12 10211, 2011
-
ocharles
on astro
2011-04-12 10219, 2011
-
ocharles
the bulk of the time is in the lookup phase
2011-04-12 10240, 2011
-
ocharles
# Looks like you failed 21 tests of 25. < for MBID requests
2011-04-12 10200, 2011
-
ocharles
wait... are you sure debug mode is on? I shouldn't even see logging if that's on
2011-04-12 10216, 2011
-
ocharles
CATALYST_DEBUG { 1 } in DBDefs, iirc
2011-04-12 10222, 2011
-
ocharles
{ 0 } even :)
2011-04-12 10224, 2011
-
ruaok
its now 0.
2011-04-12 10228, 2011
-
ruaok
but I haven't restarted.
2011-04-12 10232, 2011
-
ocharles
ah
2011-04-12 10233, 2011
-
ocharles
restart
2011-04-12 10215, 2011
-
ocharles
nah, no real difference
2011-04-12 10236, 2011
-
ocharles
# Looks like you failed 24 tests of 25.
2011-04-12 10249, 2011
-
ruaok
and if you run it again?
2011-04-12 10229, 2011
-
ocharles
# Looks like you failed 20 tests of 25.
2011-04-12 10249, 2011
-
ruaok
I wonder if we need more indexes in the DB>
2011-04-12 10221, 2011
-
ocharles
that's where postgres logs come in
2011-04-12 10226, 2011
-
djce
What does the dependency stack look like for running the ngs replication exporter?
2011-04-12 10229, 2011
-
ocharles
but i think it's more that we need more caching
2011-04-12 10248, 2011
-
ruaok
djce: gimme one minute.
2011-04-12 10212, 2011
-
ruaok
ocharles: should we drop the slow query threshold down a little?
2011-04-12 10258, 2011
-
ocharles
well there's no threshold atm, it's just "ngstime <= masontime" I don't expect that to always pass, but it should be failing very rarely
2011-04-12 10215, 2011
-
ruaok
ocharles: no on the slow query threshold in PG.
2011-04-12 10219, 2011
-
ocharles
oh
2011-04-12 10220, 2011
-
ocharles
for the logs
2011-04-12 10224, 2011
-
ocharles
yea, set that to 0
2011-04-12 10233, 2011
-
ocharles
and clear the logs if you can, so the logs only have this (relevent) data
2011-04-12 10249, 2011
-
ruaok
k
2011-04-12 10227, 2011
-
ruaok
ah.
2011-04-12 10234, 2011
-
ruaok
there is one query that keeps coming up frequently.
2011-04-12 10241, 2011
-
ocharles
this sounds promising
2011-04-12 10205, 2011
-
ruaok
2011-04-12 10212, 2011
-
ruaok
work on that one first.
2011-04-12 10218, 2011
-
ruaok
that is the only thing showing up.
2011-04-12 10221, 2011
-
ruaok
let me clear that.
2011-04-12 10232, 2011
-
ocharles
that looks like Release->get_by_recording
2011-04-12 10211, 2011
-
ocharles
if you clear the log and I re run this I can put it through pgfouine to get a better view (and copy/pasta-able queries for explain)
2011-04-12 10247, 2011
-
ruaok
log cleared. old log saved.
2011-04-12 10256, 2011
-
ocharles
ok
2011-04-12 10244, 2011
-
ocharles
test ran
2011-04-12 10209, 2011
-
ruaok
that query is back.
2011-04-12 10215, 2011
-
ruaok
want the lgo?
2011-04-12 10238, 2011
-
ruaok
2011-04-12 10239, 2011
-
ocharles
yep, please
2011-04-12 10245, 2011
-
ocharles
that's the whole log?
2011-04-12 10254, 2011
-
ruaok
thats all it gave me.
2011-04-12 10200, 2011
-
ocharles
hrm