#musicbrainz-devel

/

      • ruaok
        without waiting for a decision from me?
      • 2011-04-12 10241, 2011

      • ruaok grumbles
      • 2011-04-12 10214, 2011

      • ocharles
        sorry, it must have got caught up in the flurry of ship its warp gave me
      • 2011-04-12 10218, 2011

      • ocharles
        it can still be reverted
      • 2011-04-12 10245, 2011

      • ruaok
        just do 10 push ups as a penalty.
      • 2011-04-12 10256, 2011

      • ruaok snickers
      • 2011-04-12 10205, 2011

      • ocharles
        i run 2 miles a day, easy@
      • 2011-04-12 10216, 2011

      • ocharles
        wait, I mean, yea, 10 pushups is a really bad penalty
      • 2011-04-12 10225, 2011

      • ruaok
        lol.
      • 2011-04-12 10244, 2011

      • ruaok
        two beer penalty then. drink two beers less new time you're a the pub.
      • 2011-04-12 10251, 2011

      • ruaok
        and buy me two beers next time you see me. :)
      • 2011-04-12 10254, 2011

      • ocharles
        :)
      • 2011-04-12 10201, 2011

      • ocharles
        so i dunno what we do about about this replay script though
      • 2011-04-12 10219, 2011

      • ruaok
        whats the problem?
      • 2011-04-12 10222, 2011

      • ocharles
        i guess i need to do some more thorough analysis to see where it's failing and if it's only, say, /ws/1/release that fails most often
      • 2011-04-12 10235, 2011

      • ocharles
        the problem is that over 80% of requests take longer on ngs.mb, than they do on the main servers
      • 2011-04-12 10258, 2011

      • ruaok
        based on how many calls?
      • 2011-04-12 10210, 2011

      • ocharles
        100
      • 2011-04-12 10216, 2011

      • ruaok
        for the most part, there isn't enough traffic to warm up the sites.
      • 2011-04-12 10224, 2011

      • ruaok
        essentially ALL calls are cold.
      • 2011-04-12 10229, 2011

      • ocharles
        this is the same set of requests
      • 2011-04-12 10236, 2011

      • ocharles
        it's the same even if I re run it when everything should be in cache
      • 2011-04-12 10255, 2011

      • ruaok
        that is if memcached was setup for the site.
      • 2011-04-12 10258, 2011

      • ruaok
        which its not.
      • 2011-04-12 10207, 2011

      • ocharles
        ngs doesn't have memcached?
      • 2011-04-12 10212, 2011

      • ruaok
        not yet, not.
      • 2011-04-12 10213, 2011

      • ruaok
        no
      • 2011-04-12 10229, 2011

      • ocharles
        ok, then that makes this script a bit more pointless, yes
      • 2011-04-12 10229, 2011

      • ruaok
        I have yet to find a machine that has a random 3-4GB of ram free for me to futz with.
      • 2011-04-12 10243, 2011

      • ruaok
        well, I didn't say that we're ready for that yet.
      • 2011-04-12 10249, 2011

      • ocharles
        right
      • 2011-04-12 10255, 2011

      • ruaok
        you gonna be up for a while still?
      • 2011-04-12 10201, 2011

      • ruaok
        I can fake it for now.
      • 2011-04-12 10203, 2011

      • ocharles
        mmm, yea, a few more hours at least
      • 2011-04-12 10205, 2011

      • ruaok
        ok.
      • 2011-04-12 10213, 2011

      • ruaok
        let me get dora to be a memcached server.
      • 2011-04-12 10217, 2011

      • ruaok
        then we can run your test again.
      • 2011-04-12 10224, 2011

      • ocharles
        even 256mb is probably enough to see some change
      • 2011-04-12 10215, 2011

      • ruaok
        4GB is a good start. :-)
      • 2011-04-12 10258, 2011

      • ruaok
        hmmm.
      • 2011-04-12 10215, 2011

      • ruaok
        # XXX remove this
      • 2011-04-12 10216, 2011

      • ruaok
        # Cache::Memcached options
      • 2011-04-12 10221, 2011

      • ruaok
        should that be removed now?
      • 2011-04-12 10256, 2011

      • ocharles
        hrm
      • 2011-04-12 10259, 2011

      • ruaok
        ok, ngs should now have a 4GB memcached available for its us.
      • 2011-04-12 10200, 2011

      • ruaok
        use
      • 2011-04-12 10209, 2011

      • ruaok
        we should see some speed ups coming now.
      • 2011-04-12 10210, 2011

      • ocharles
        i think stuff still probably uses that
      • 2011-04-12 10217, 2011

      • ocharles
        ruaok: all services restarted too?
      • 2011-04-12 10245, 2011

      • ruaok
        fastcgi has been.
      • 2011-04-12 10252, 2011

      • ocharles
        ok
      • 2011-04-12 10255, 2011

      • ruaok
        I'll check memcached that its getting items
      • 2011-04-12 10259, 2011

      • ruaok
        once you start the script.
      • 2011-04-12 10202, 2011

      • ocharles
        started
      • 2011-04-12 10234, 2011

      • ocharles
        # Looks like you failed 77 tests of 97.
      • 2011-04-12 10235, 2011

      • ocharles
        after the first run
      • 2011-04-12 10236, 2011

      • ruaok
        55/109
      • 2011-04-12 10240, 2011

      • ruaok
        hits/misses
      • 2011-04-12 10240, 2011

      • ocharles
        shall I run again?
      • 2011-04-12 10245, 2011

      • ruaok
        go for it
      • 2011-04-12 10231, 2011

      • ruaok
        hit rate is drastically improving, but thats not surprsing.
      • 2011-04-12 10201, 2011

      • ruaok
        129/116
      • 2011-04-12 10230, 2011

      • ruaok
        and?
      • 2011-04-12 10233, 2011

      • ocharles
        still running
      • 2011-04-12 10242, 2011

      • ocharles
        1s pause between each test
      • 2011-04-12 10254, 2011

      • ocharles
        but it doesn't look too good
      • 2011-04-12 10208, 2011

      • ocharles
        the times look pretty close though
      • 2011-04-12 10215, 2011

      • ocharles
        (in some cases)
      • 2011-04-12 10219, 2011

      • ruaok
        paste me some results.
      • 2011-04-12 10222, 2011

      • ocharles
        # Looks like you failed 73 tests of 97.
      • 2011-04-12 10243, 2011

      • ruaok
        what is consdiered a failure?
      • 2011-04-12 10252, 2011

      • ocharles
      • 2011-04-12 10259, 2011

      • ocharles
        if ngs takes more time than the main server
      • 2011-04-12 10205, 2011

      • ocharles
        the numbers there are seconds
      • 2011-04-12 10215, 2011

      • ocharles
        first number is ngs time, second number is main server time
      • 2011-04-12 10247, 2011

      • ruaok
        the track searches seem very slow.
      • 2011-04-12 10217, 2011

      • ruaok
        but each of those was a 200 OK result?
      • 2011-04-12 10251, 2011

      • ocharles
        there is no check for that
      • 2011-04-12 10253, 2011

      • ocharles
        i'll add that in
      • 2011-04-12 10257, 2011

      • ruaok
        I would love to separate these tests into tests of web vs search.
      • 2011-04-12 10223, 2011

      • ruaok
        and something must be wrong, cold or misconfigured.
      • 2011-04-12 10240, 2011

      • ruaok
        ngs on idle hardware is much slower than mason on hot hardware.
      • 2011-04-12 10232, 2011

      • ocharles
        running now with a check for 200
      • 2011-04-12 10246, 2011

      • ruaok
        k
      • 2011-04-12 10234, 2011

      • ocharles
        a test for "/ws/1/.*/?.*query=" should be enough to find searches, right?
      • 2011-04-12 10259, 2011

      • ocharles
        of course a breakdown of times per end point would be better, but that will come tomorrow
      • 2011-04-12 10219, 2011

      • ruaok nods
      • 2011-04-12 10224, 2011

      • ocharles
        that reads to me that nearly all tests failed the duration check
      • 2011-04-12 10238, 2011

      • ocharles
        but with limited stat output it's a bit hard to tell
      • 2011-04-12 10246, 2011

      • ocharles
        i'll add that regex in to filter search queries out
      • 2011-04-12 10254, 2011

      • Batsy joined the channel
      • 2011-04-12 10237, 2011

      • ruaok
        we're also only using 3 machines right now.
      • 2011-04-12 10248, 2011

      • ruaok
        but they are mostly idle. should not really be an issue.
      • 2011-04-12 10244, 2011

      • ocharles
        # Looks like you failed 59 tests of 72. <-- for ws/1 requests that don't take an MBID
      • 2011-04-12 10228, 2011

      • ruaok
        not very encouraging, is it?
      • 2011-04-12 10258, 2011

      • ocharles
        nay :(
      • 2011-04-12 10230, 2011

      • ocharles
        do I have access to fastcgi logs?
      • 2011-04-12 10247, 2011

      • ocharles
        also it might be faster if we're not running ngs in debug mode
      • 2011-04-12 10253, 2011

      • ocharles
        but I don't think that'll make a huge difference
      • 2011-04-12 10218, 2011

      • ruaok
        you should be able to access them.
      • 2011-04-12 10223, 2011

      • ruaok
        let me turn off debug. one sec.
      • 2011-04-12 10259, 2011

      • ruaok
        ok to restart?
      • 2011-04-12 10247, 2011

      • ocharles
        one sec
      • 2011-04-12 10209, 2011

      • ocharles
        looking at: tail -f /usr/local/mb_server-fastcgi/log/main/current | tai64nlocal
      • 2011-04-12 10211, 2011

      • ocharles
        on astro
      • 2011-04-12 10219, 2011

      • ocharles
        the bulk of the time is in the lookup phase
      • 2011-04-12 10240, 2011

      • ocharles
        # Looks like you failed 21 tests of 25. < for MBID requests
      • 2011-04-12 10200, 2011

      • ocharles
        wait... are you sure debug mode is on? I shouldn't even see logging if that's on
      • 2011-04-12 10216, 2011

      • ocharles
        CATALYST_DEBUG { 1 } in DBDefs, iirc
      • 2011-04-12 10222, 2011

      • ocharles
        { 0 } even :)
      • 2011-04-12 10224, 2011

      • ruaok
        its now 0.
      • 2011-04-12 10228, 2011

      • ruaok
        but I haven't restarted.
      • 2011-04-12 10232, 2011

      • ocharles
        ah
      • 2011-04-12 10233, 2011

      • ocharles
        restart
      • 2011-04-12 10215, 2011

      • ocharles
        nah, no real difference
      • 2011-04-12 10236, 2011

      • ocharles
        # Looks like you failed 24 tests of 25.
      • 2011-04-12 10249, 2011

      • ruaok
        and if you run it again?
      • 2011-04-12 10229, 2011

      • ocharles
        # Looks like you failed 20 tests of 25.
      • 2011-04-12 10249, 2011

      • ruaok
        I wonder if we need more indexes in the DB>
      • 2011-04-12 10221, 2011

      • ocharles
        that's where postgres logs come in
      • 2011-04-12 10226, 2011

      • djce
        What does the dependency stack look like for running the ngs replication exporter?
      • 2011-04-12 10229, 2011

      • ocharles
        but i think it's more that we need more caching
      • 2011-04-12 10248, 2011

      • ruaok
        djce: gimme one minute.
      • 2011-04-12 10212, 2011

      • ruaok
        ocharles: should we drop the slow query threshold down a little?
      • 2011-04-12 10258, 2011

      • ocharles
        well there's no threshold atm, it's just "ngstime <= masontime" I don't expect that to always pass, but it should be failing very rarely
      • 2011-04-12 10215, 2011

      • ruaok
        ocharles: no on the slow query threshold in PG.
      • 2011-04-12 10219, 2011

      • ocharles
        oh
      • 2011-04-12 10220, 2011

      • ocharles
        for the logs
      • 2011-04-12 10224, 2011

      • ocharles
        yea, set that to 0
      • 2011-04-12 10233, 2011

      • ocharles
        and clear the logs if you can, so the logs only have this (relevent) data
      • 2011-04-12 10249, 2011

      • ruaok
        k
      • 2011-04-12 10227, 2011

      • ruaok
        ah.
      • 2011-04-12 10234, 2011

      • ruaok
        there is one query that keeps coming up frequently.
      • 2011-04-12 10241, 2011

      • ocharles
        this sounds promising
      • 2011-04-12 10205, 2011

      • ruaok
      • 2011-04-12 10212, 2011

      • ruaok
        work on that one first.
      • 2011-04-12 10218, 2011

      • ruaok
        that is the only thing showing up.
      • 2011-04-12 10221, 2011

      • ruaok
        let me clear that.
      • 2011-04-12 10232, 2011

      • ocharles
        that looks like Release->get_by_recording
      • 2011-04-12 10211, 2011

      • ocharles
        if you clear the log and I re run this I can put it through pgfouine to get a better view (and copy/pasta-able queries for explain)
      • 2011-04-12 10247, 2011

      • ruaok
        log cleared. old log saved.
      • 2011-04-12 10256, 2011

      • ocharles
        ok
      • 2011-04-12 10244, 2011

      • ocharles
        test ran
      • 2011-04-12 10209, 2011

      • ruaok
        that query is back.
      • 2011-04-12 10215, 2011

      • ruaok
        want the lgo?
      • 2011-04-12 10238, 2011

      • ruaok
      • 2011-04-12 10239, 2011

      • ocharles
        yep, please
      • 2011-04-12 10245, 2011

      • ocharles
        that's the whole log?
      • 2011-04-12 10254, 2011

      • ruaok
        thats all it gave me.
      • 2011-04-12 10200, 2011

      • ocharles
        hrm