#musicbrainz-devel

/

      • Hadora
        has been
      • 2014-01-06 00648, 2014

      • Hadora
        in the past
      • 2014-01-06 00654, 2014

      • ianmcorvidae
        since it started
      • 2014-01-06 00656, 2014

      • ianmcorvidae
        it's unplanned downtime
      • 2014-01-06 00605, 2014

      • Hadora
        I know, but I thought someone could knew when the db was switch to read-only, w/e
      • 2014-01-06 00625, 2014

      • uk_
        at least 40 minutes
      • 2014-01-06 00651, 2014

      • uk__ joined the channel
      • 2014-01-06 00604, 2014

      • Hadora
        ah ok thanks i thought it was for the whole week-end
      • 2014-01-06 00629, 2014

      • Hadora
        thanks
      • 2014-01-06 00613, 2014

      • uk__
        So what was the problem?
      • 2014-01-06 00605, 2014

      • travis-ci joined the channel
      • 2014-01-06 00606, 2014

      • travis-ci has left the channel
      • 2014-01-06 00622, 2014

      • Jozo
        How about adding banner 'cos 504?
      • 2014-01-06 00609, 2014

      • reosarevok joined the channel
      • 2014-01-06 00633, 2014

      • reosarevok
        :(
      • 2014-01-06 00642, 2014

      • reosarevok
        I see the whole site is still fucked up
      • 2014-01-06 00640, 2014

      • kuno
        wb reo, I think ian was/is looking into it.
      • 2014-01-06 00613, 2014

      • ruaok joined the channel
      • 2014-01-06 00648, 2014

      • ruaok
        ocharles: ping
      • 2014-01-06 00645, 2014

      • uk__ has left the channel
      • 2014-01-06 00645, 2014

      • reosarevok joined the channel
      • 2014-01-06 00650, 2014

      • reosarevok joined the channel
      • 2014-01-06 00625, 2014

      • ruaok starts writing a script to replace himself
      • 2014-01-06 00638, 2014

      • ruaok
        can't be more than a few lines, really. :)
      • 2014-01-06 00658, 2014

      • Mineo
        since you're able to write that script, would that script be able to write a script that replaces itself again?:)
      • 2014-01-06 00658, 2014

      • ruaok
        I think you're giving the original script writer way too much credit. :)
      • 2014-01-06 00653, 2014

      • adhawkins
        Site still having issues?
      • 2014-01-06 00659, 2014

      • adhawkins
        504 Gateway time-out...
      • 2014-01-06 00605, 2014

      • ruaok
        yep.
      • 2014-01-06 00618, 2014

      • adhawkins
        Okey dokey, I'll leave you to it then. No biggy :)
      • 2014-01-06 00623, 2014

      • adhawkins
        Happy New Year all by the way :D
      • 2014-01-06 00600, 2014

      • ruaok
        happy new year. :)
      • 2014-01-06 00633, 2014

      • adhawkins waves
      • 2014-01-06 00610, 2014

      • Freso joined the channel
      • 2014-01-06 00647, 2014

      • demonimin joined the channel
      • 2014-01-06 00608, 2014

      • Nyanko-sensei joined the channel
      • 2014-01-06 00654, 2014

      • ocharles
        ruaok: pong
      • 2014-01-06 00611, 2014

      • ruaok
        heya. got a few minutes to help assess an unhappy totoro?
      • 2014-01-06 00616, 2014

      • ruaok
        both Ian and I are stuck.
      • 2014-01-06 00629, 2014

      • ocharles
        Sure, let me just finish what I'm working on
      • 2014-01-06 00629, 2014

      • ruaok
        for an intro, see ian's last email.
      • 2014-01-06 00633, 2014

      • ruaok
        thx
      • 2014-01-06 00623, 2014

      • ocharles
      • 2014-01-06 00644, 2014

      • ruaok
        yep.
      • 2014-01-06 00634, 2014

      • ruaok
        from what I can tell, everything is really happy.
      • 2014-01-06 00640, 2014

      • ruaok
        except for the disk io.
      • 2014-01-06 00652, 2014

      • ruaok
        indexes are happy.
      • 2014-01-06 00615, 2014

      • ruaok
        but occasionally we get some stuck queries. and always the same query.
      • 2014-01-06 00633, 2014

      • ruaok
        I wonder if this is a symptom, rather than the cause.
      • 2014-01-06 00626, 2014

      • ocharles
        I dunno, IO is back down to much saner levels right now
      • 2014-01-06 00637, 2014

      • ruaok
        I've been killing stuck procs.
      • 2014-01-06 00639, 2014

      • ocharles
        but a few minutes ago it was at 100% for a single backend, so I would think that query is what does the damage
      • 2014-01-06 00655, 2014

      • ocharles
        ok, hold off on that for a moment so I can observe them
      • 2014-01-06 00657, 2014

      • ruaok
        I've stopped the script that kills stuck procs.
      • 2014-01-06 00600, 2014

      • ruaok nods
      • 2014-01-06 00602, 2014

      • ocharles
        ta
      • 2014-01-06 00629, 2014

      • ruaok
        so, while we wait, lets summarize what I've learned.
      • 2014-01-06 00635, 2014

      • ruaok
        lots of reads, almost no writes.
      • 2014-01-06 00613, 2014

      • ruaok
        this query is interesting: select * from pg_statio_all_tables order by heap_blks_read desc;
      • 2014-01-06 00631, 2014

      • ruaok
        loads and loads of heap blocks read on edit table.
      • 2014-01-06 00659, 2014

      • ruaok
        wow. load 4. thats the lowest I've seen in ages. :)
      • 2014-01-06 00606, 2014

      • ocharles
        :)
      • 2014-01-06 00614, 2014

      • ocharles
        What I find odd is that Ian says there are queries running for 6 hours
      • 2014-01-06 00624, 2014

      • ruaok
        I have the sneaky suspicion that we're out of shared_buffers.
      • 2014-01-06 00626, 2014

      • ocharles
        but we set the statement_timeout to 60 seconds on connection
      • 2014-01-06 00631, 2014

      • ruaok
        yes, those are the ones I've been killing.
      • 2014-01-06 00643, 2014

      • ruaok
        yes and his email also acks that those do not seem to be working.
      • 2014-01-06 00657, 2014

      • ruaok
        and that pg_bouncer is as he eloquently puts it "farting on things" :)
      • 2014-01-06 00605, 2014

      • ocharles
        yes, but this isn't anything to do with pgbouncer
      • 2014-01-06 00612, 2014

      • ruaok
        agreed.
      • 2014-01-06 00612, 2014

      • ocharles
        this is an option for postgresql itself
      • 2014-01-06 00622, 2014

      • ocharles
        at least, that would be my understanding...
      • 2014-01-06 00622, 2014

      • ruaok
        there are a lot of disparate symptoms
      • 2014-01-06 00646, 2014

      • ruaok
        my current theory, and a weak one at that, is that we've run out of shared_buffers.
      • 2014-01-06 00653, 2014

      • ruaok
        and that is driving up the load accessing the edit table.
      • 2014-01-06 00601, 2014

      • ruaok
        and that other issues are knock on effects.
      • 2014-01-06 00651, 2014

      • ruaok loads http://musicbrainz.org/artist/a3cb23fc-acd3-4ce0-8f36-1e5aa6a18432/edits
      • 2014-01-06 00603, 2014

      • ruaok
        which 504s with good consistency
      • 2014-01-06 00621, 2014

      • ruaok
        and immediately you can see one stuck query
      • 2014-01-06 00628, 2014

      • ruaok
        select (extract(epoch from now()) - extract(epoch from xact_start))::integer as secs, procpid from pg_stat_activity where current_query ilike '%with x as%' order by xact_start;
      • 2014-01-06 00638, 2014

      • ruaok
        proc 987
      • 2014-01-06 00654, 2014

      • ruaok
        if you click that link, we'll get another stuck proc.
      • 2014-01-06 00607, 2014

      • ruaok
        and even with one our disk io is neark 10MB/s
      • 2014-01-06 00652, 2014

      • ruaok
        I got my 504.
      • 2014-01-06 00602, 2014

      • ruaok
        and the query is still running.
      • 2014-01-06 00624, 2014

      • ruaok
        done now
      • 2014-01-06 00615, 2014

      • ruaok
        now that link loads, since the results are in cache.
      • 2014-01-06 00609, 2014

      • ruaok does it again with pink floyd, proc 925
      • 2014-01-06 00628, 2014

      • ocharles
        Somewhat strange that there are <IDLE> in transaction things
      • 2014-01-06 00645, 2014

      • ocharles
        oh, I think that was just random chance actually, ignore that
      • 2014-01-06 00619, 2014

      • ruaok
        yeah, its hard to nail any one thing down.
      • 2014-01-06 00625, 2014

      • ruaok
        nothing really seems amiss.
      • 2014-01-06 00637, 2014

      • ruaok
        nothing specific, that is
      • 2014-01-06 00607, 2014

      • ocharles
        Things look ok for the moment. Keep the script off and I'll keep watching things
      • 2014-01-06 00653, 2014

      • reosarevok
        "ok" seems strange when I can't see artist's edit histories tbh :p
      • 2014-01-06 00620, 2014

      • ocharles
        Oh, hmm
      • 2014-01-06 00641, 2014

      • ruaok
        ocharles: I've just been trying to load some artist edit histories for your entertainment. :)
      • 2014-01-06 00609, 2014

      • ruaok
        its *very* predictable and consistent.
      • 2014-01-06 00651, 2014

      • ocharles
        4 minutes not hitting a timeout is very weird though, hrm
      • 2014-01-06 00610, 2014

      • ocharles
        I just manually ran one of those queries and the timeout definitely works
      • 2014-01-06 00614, 2014

      • ocharles
        Let me just try it through pgbouncer
      • 2014-01-06 00619, 2014

      • ruaok
        k
      • 2014-01-06 00600, 2014

      • ocharles
        yea, the timeout works fine there. hrm
      • 2014-01-06 00630, 2014

      • ocharles
        ruaok: we could run with your hypothesis that shared_buffers needs to go up
      • 2014-01-06 00651, 2014

      • ruaok
        its worth a try.
      • 2014-01-06 00615, 2014

      • ruaok
        and w're on 64 bit, built for 64 bit.
      • 2014-01-06 00624, 2014

      • ruaok
        we should be fine going above 8GB from what I read.
      • 2014-01-06 00640, 2014

      • ocharles
        most people don't suggest going much above 8Gb though
      • 2014-01-06 00644, 2014

      • ruaok
        so, I'd be tempted to go to 1/3 ram, from 1/4ram.
      • 2014-01-06 00651, 2014

      • ocharles
        when you start going higher it becomes detrimental
      • 2014-01-06 00608, 2014

      • ocharles
        because now you have a bigger buffer to manage
      • 2014-01-06 00617, 2014

      • ruaok
        I read an article that disputed that.
      • 2014-01-06 00619, 2014

      • ruaok
        let me find it.
      • 2014-01-06 00647, 2014

      • ruaok
      • 2014-01-06 00609, 2014

      • ocharles
        fair enough
      • 2014-01-06 00633, 2014

      • ruaok
        so we need to up the kernel.shmmax value.
      • 2014-01-06 00644, 2014

      • ruaok
        and then restart.
      • 2014-01-06 00654, 2014

      • ruaok
        but I never know what to set it to a priori.
      • 2014-01-06 00608, 2014

      • ruaok
        upping the limit, starting, seeing the error, set kernel value
      • 2014-01-06 00614, 2014

      • ruaok
        that is my usual process.
      • 2014-01-06 00632, 2014

      • ocharles
        you could just try doubling both numbers :)
      • 2014-01-06 00639, 2014

      • ocharles
        hrm
      • 2014-01-06 00645, 2014

      • ruaok
        thats a bit drastic, no?
      • 2014-01-06 00602, 2014

      • ruaok
        maybe go to 10GB, from 8GB
      • 2014-01-06 00626, 2014

      • ocharles
        has anyone tried vacuuming?
      • 2014-01-06 00634, 2014

      • ruaok
        I haven't
      • 2014-01-06 00640, 2014

      • ocharles
        Maybe we should try vacuuming edit first
      • 2014-01-06 00613, 2014

      • ocharles
        Index Scan using edit_pkey on edit (cost=0.00..2689861.26 rows=23225855 width=419) (actual time=0.005..405.894 rows=1308565 loops=1)
      • 2014-01-06 00619, 2014

      • ocharles
        because that feels really high
      • 2014-01-06 00626, 2014

      • Freso
        I need to vacuum the apartment. But I also need to read up on anatomy for exam/test tomorrow. :|
      • 2014-01-06 00640, 2014

      • Freso re-lurks
      • 2014-01-06 00641, 2014

      • ruaok
        ocharles: go for it.
      • 2014-01-06 00617, 2014

      • ruaok
        ocharles: should I kill old locks first?
      • 2014-01-06 00619, 2014

      • ruaok
        otherwise we
      • 2014-01-06 00625, 2014

      • ruaok
        're doing to DOS ourselves. :)
      • 2014-01-06 00638, 2014

      • ocharles
        not sure what you mean by dosing yourself
      • 2014-01-06 00638, 2014

      • Jozo
        How big are postgresql indexes compared to available memory?
      • 2014-01-06 00650, 2014

      • ocharles
        vacuum analyze will block writes, but reads should carry on, iirc
      • 2014-01-06 00609, 2014

      • ruaok
        too much io, a lot of it for queries that will fail.
      • 2014-01-06 00615, 2014

      • ruaok
        so it will take much longer than needed.
      • 2014-01-06 00620, 2014

      • ruaok goes to kill old queries
      • 2014-01-06 00624, 2014

      • ruaok
        have you started the vacuum?
      • 2014-01-06 00654, 2014

      • ruaok
        Jozo: how do you query that?
      • 2014-01-06 00604, 2014

      • ocharles
        I've issued the command, but it might be waiting
      • 2014-01-06 00632, 2014

      • ruaok
        vacuum analyze?
      • 2014-01-06 00634, 2014

      • ocharles
        yea
      • 2014-01-06 00637, 2014

      • ruaok
        k.
      • 2014-01-06 00642, 2014

      • ruaok
        the io went through the roof.
      • 2014-01-06 00651, 2014

      • Jozo
        ruaok: No idea (just guessing what may be wrong)
      • 2014-01-06 00651, 2014

      • ruaok
        the load will soon follow