#metabrainz

/

      • zas
        ok, where's pg bouncer ? how can we verify it works ?
      • 2018-04-30 12029, 2018

      • yvanzo
        in postgres-master container, but it is not recognized by `service`
      • 2018-04-30 12050, 2018

      • yvanzo
        you can see it runs from the process list
      • 2018-04-30 12008, 2018

      • ruaok wonders if bitmap is up yet
      • 2018-04-30 12031, 2018

      • samj1912
        ruaok: pong
      • 2018-04-30 12034, 2018

      • yvanzo
        and you can check ports with netstat -lae
      • 2018-04-30 12058, 2018

      • ruaok
        samj1912: is solr indexer running off the DB on bowie or queen?
      • 2018-04-30 12008, 2018

      • samj1912
        let me check
      • 2018-04-30 12013, 2018

      • zas
        yvanzo: but did you restart it yet ?
      • 2018-04-30 12014, 2018

      • samj1912
        yeah
      • 2018-04-30 12020, 2018

      • samj1912
        but it shouldnt be affecting bowie
      • 2018-04-30 12021, 2018

      • samj1912
      • 2018-04-30 12028, 2018

      • ruaok
        yeah, running off queen?
      • 2018-04-30 12009, 2018

      • zas
      • 2018-04-30 12036, 2018

      • bitmap
        I'm here (trying to make coffee simultaneously)
      • 2018-04-30 12042, 2018

      • ruaok
        HALP. ;)
      • 2018-04-30 12042, 2018

      • bitmap
        what can I help with?
      • 2018-04-30 12049, 2018

      • ruaok
        all sites are down.
      • 2018-04-30 12053, 2018

      • zas
        i restarted pgbouncer
      • 2018-04-30 12056, 2018

      • ruaok
        things didn't come back up after reboot.
      • 2018-04-30 12009, 2018

      • ruaok
        that was it.
      • 2018-04-30 12011, 2018

      • ruaok
        thanks zas.
      • 2018-04-30 12021, 2018

      • bitmap
        yup, mb looks ok now
      • 2018-04-30 12023, 2018

      • samj1912
        ruaok: no its running off of bowie, but no solr activity to suggest it was overloading bowie
      • 2018-04-30 12041, 2018

      • ruaok
        samj1912: we're just trying to get all and any load off bowie.
      • 2018-04-30 12050, 2018

      • zas
        i think pgbouncer started too soon or smt
      • 2018-04-30 12005, 2018

      • zas
        is there any log for it ?
      • 2018-04-30 12010, 2018

      • samj1912
        ah, ill shut it down then
      • 2018-04-30 12016, 2018

      • yvanzo
        zas: that’s it
      • 2018-04-30 12025, 2018

      • ruaok
        yvanzo: why did you not restart pg-bouncer?
      • 2018-04-30 12043, 2018

      • yvanzo
        ruaok: did not find how
      • 2018-04-30 12053, 2018

      • bitmap
        sometimes consul-template doesn't update the pgbouncer config and it's just empty :/
      • 2018-04-30 12003, 2018

      • ruaok
        ok. that does it.
      • 2018-04-30 12016, 2018

      • ruaok
        everyone STOP WHAT YOU ARE DOING.
      • 2018-04-30 12022, 2018

      • ruaok
        (unless you're making coffee)
      • 2018-04-30 12039, 2018

      • ruaok
        bitmap: please write a short guide on how to restart pg-bouncer.
      • 2018-04-30 12049, 2018

      • ruaok
        yvanzo: please write a short guide on how to restart postgres.
      • 2018-04-30 12023, 2018

      • ruaok
        please add in the knowledge we just currently gained.
      • 2018-04-30 12032, 2018

      • ruaok
        then make PRs against the current syswiki.
      • 2018-04-30 12039, 2018

      • ruaok
        ok, bitmap, yvanzo?
      • 2018-04-30 12046, 2018

      • yvanzo
        on it :)
      • 2018-04-30 12053, 2018

      • ruaok
        zas: you please read the PRs and make sure they are clear to you.
      • 2018-04-30 12059, 2018

      • zas
        yup
      • 2018-04-30 12001, 2018

      • bitmap
        ok 👌
      • 2018-04-30 12005, 2018

      • ruaok
        thanks!!
      • 2018-04-30 12021, 2018

      • zas
        bitmap: i think it is because of sv manager, and possible delays
      • 2018-04-30 12044, 2018

      • bitmap
        pgbouncer not starting?
      • 2018-04-30 12048, 2018

      • yvanzo
        zas: thanks, I was looking into service instead of sv.
      • 2018-04-30 12000, 2018

      • zas
        bitmap: it started, but without proper config
      • 2018-04-30 12004, 2018

      • yvanzo
        It started but caused a race issue.
      • 2018-04-30 12025, 2018

      • zas
        i think it started too soon, and since it isn't managed by consul-template itself that's likely to happen
      • 2018-04-30 12007, 2018

      • bitmap
        it should be managed by consul-template iirc
      • 2018-04-30 12041, 2018

      • bitmap
      • 2018-04-30 12023, 2018

      • zas
        huh ? but sv is managing it atm
      • 2018-04-30 12029, 2018

      • zas
        or i miss something
      • 2018-04-30 12042, 2018

      • bitmap
        sv is managing the consul-template process which manages pgbouncer :)
      • 2018-04-30 12011, 2018

      • bitmap
        there are two consul-template processes running, one for pgbouncer, one for pg
      • 2018-04-30 12054, 2018

      • zas
        ok i see "exec run-consul-template -config /etc/consul-template-pgbouncer.conf" in sv run
      • 2018-04-30 12045, 2018

      • zas
      • 2018-04-30 12004, 2018

      • zas
        from /var/log/syslog in pg master container
      • 2018-04-30 12010, 2018

      • zas
        it was at startup
      • 2018-04-30 12042, 2018

      • bitmap
        consul returned 500 errors?
      • 2018-04-30 12006, 2018

      • bitmap
        nvm, No known Consul servers
      • 2018-04-30 12034, 2018

      • zas
        it looks to me a temporary issue at startup
      • 2018-04-30 12037, 2018

      • bitmap
        the "No known Consul servers" is coming from the consul agent, though? that seems weird
      • 2018-04-30 12030, 2018

      • bitmap
        as long as it was temporary I guess
      • 2018-04-30 12039, 2018

      • CatQuest
        her var det jo full rulle :D
      • 2018-04-30 12004, 2018

      • CatQuest
        fyi mb seems to work fine to me (and snappier thna before)
      • 2018-04-30 12014, 2018

      • zas
        https://github.com/metabrainz/docker-postgres/blo… -> when templates are rendered pgbouncer doesn't restart ?
      • 2018-04-30 12041, 2018

      • CatQuest
        !m you guys! (even if you're not done yet)
      • 2018-04-30 12042, 2018

      • BrainzBot
        You're doing good work, you guys! (even if you're not done yet)!
      • 2018-04-30 12023, 2018

      • kaliko has quit
      • 2018-04-30 12035, 2018

      • bitmap
        it receives sighup when templates changes (which causes it to reload the config)
      • 2018-04-30 12044, 2018

      • bitmap
        after templates are rendered, I mean
      • 2018-04-30 12057, 2018

      • zas
        but then why a manual restart worked ?
      • 2018-04-30 12023, 2018

      • zas
        nvm, i restarted consul template (not pgbouncer directly)
      • 2018-04-30 12026, 2018

      • bitmap
        I suppose restarting it with sv would also restart the consul-template process
      • 2018-04-30 12024, 2018

      • zas
        bitmap: i propose to increase the log verbosity for this consul template process so we can have more infos about what's going on
      • 2018-04-30 12053, 2018

      • zas
        also we need to find a way to reproduce it, it happened after a full reboot (but not after a container restart)
      • 2018-04-30 12002, 2018

      • i7c has quit
      • 2018-04-30 12043, 2018

      • bitmap
        sometimes consul-template will get triggered when you push *anything* to docker-server-configs, and write a blank config for pgbouncer
      • 2018-04-30 12009, 2018

      • bitmap
        possibly every time, I think we were thinking it was git2consul
      • 2018-04-30 12042, 2018

      • zas
        that's not the first time it bites us
      • 2018-04-30 12038, 2018

      • i7c joined the channel
      • 2018-04-30 12007, 2018

      • bitmap
        the "No such database: musicbrainz_db" error comes from pgbouncer, which implied that database wasn't in its config, i.e. it was empty. so I'm guessing it was the same thing this time
      • 2018-04-30 12059, 2018

      • zas
        is there any other template using the same key ?
      • 2018-04-30 12020, 2018

      • bitmap
        the keys under docker-server-configs/services/postgres-master.json? nope
      • 2018-04-30 12049, 2018

      • zas
        what's happening when pgbouncer is started without a valid config ?
      • 2018-04-30 12026, 2018

      • zas
        i mean it was apparently running and happy, but obviously was not working
      • 2018-04-30 12030, 2018

      • bitmap
        it starts fine but doesn't allow any connections to dbs not in its config
      • 2018-04-30 12052, 2018

      • bitmap
      • 2018-04-30 12056, 2018

      • bitmap
        that whole section is empty
      • 2018-04-30 12017, 2018

      • zas
        hmmm, so consul-template *thinks* it is running fine, and has no reason to restart it, if no change related to used keys is detected
      • 2018-04-30 12034, 2018

      • bitmap
        the config is valid when it's empty; even without template1 I think it would be valid
      • 2018-04-30 12009, 2018

      • bitmap
        yep, but for some reason it pulls blank or no value from consul when it first renders
      • 2018-04-30 12031, 2018

      • zas
        yes, because it may start before consul-agent
      • 2018-04-30 12037, 2018

      • zas
        or smt like that
      • 2018-04-30 12052, 2018

      • zas
        when we reboot, nothing defines order of containers
      • 2018-04-30 12010, 2018

      • zas
        and consul-template depends on local consul agent
      • 2018-04-30 12037, 2018

      • bitmap
        it previously happened when consul-agent already running, just pushing to docker-server-configs triggered it :/
      • 2018-04-30 12014, 2018

      • bitmap
        mbs has had the same issue with blank config values
      • 2018-04-30 12030, 2018

      • zas
        so i'd say consul agent is the suspect here
      • 2018-04-30 12054, 2018

      • zas
        or it is this "key or error" thing
      • 2018-04-30 12005, 2018

      • bitmap
        have we ruled out that git2consul isn't removing data from consul before it updates or something?
      • 2018-04-30 12014, 2018

      • bitmap
      • 2018-04-30 12036, 2018

      • zas
      • 2018-04-30 12040, 2018

      • bitmap
        shouldn't be deleted if key already exists, but
      • 2018-04-30 12043, 2018

      • zas
        basically there are conditions where a valid key/val is removed then written again, and it may explain things, but when they are written again consul template should "fix" the config with new values, but it doesnt
      • 2018-04-30 12011, 2018

      • zas
        and we didn't restart git2consul and/or consul to fix the issue, but just consul-template
      • 2018-04-30 12021, 2018

      • bitmap
        no new release since that pr was merged
      • 2018-04-30 12049, 2018

      • bitmap
        true, I would expect it to update teh template again after the write
      • 2018-04-30 12049, 2018

      • zas
        yes, but perhaps the behavior of git2consul and the quick replacement doesn't trigger an update
      • 2018-04-30 12001, 2018

      • zas
        we are trying to guess too much, we need more logs
      • 2018-04-30 12020, 2018

      • bitmap
        right
      • 2018-04-30 12024, 2018

      • surtin joined the channel
      • 2018-04-30 12051, 2018

      • zas
        also consul-template 0.18.5 is used in pg master container, perhaps worth a upgrade to 0.19.4
      • 2018-04-30 12056, 2018

      • ruaok
        is service "postgres-master" points to bowie, does service "postgres-slave" point to queen and would be suitable for building indexes?
      • 2018-04-30 12001, 2018

      • ruaok
        if...
      • 2018-04-30 12019, 2018

      • zas
        back to the initial issue: response times are still bad, and that's due to bowie's load which didn't change after reboot
      • 2018-04-30 12014, 2018

      • ruaok
        bowie is overloaded. that is all there is to it.
      • 2018-04-30 12027, 2018

      • bitmap
        ruaok: yes, it should be, though if queen traffic increases the pg there may generate a lot of "conflict with recovery" errors
      • 2018-04-30 12030, 2018

      • zas
        but there's no reason for it to be overloaded
      • 2018-04-30 12031, 2018

      • ruaok
        we can keep guessing. we can keep hemming. we can keep hawing.
      • 2018-04-30 12042, 2018

      • ruaok
        we need to reduce the load on bowie or nothing gets better.
      • 2018-04-30 12055, 2018

      • zas
        the traffic is the same as usual (and we had much more in past months, without having this load)
      • 2018-04-30 12058, 2018

      • ruaok
        zas: yes, there is. we're handling more search traffic than ever before.
      • 2018-04-30 12005, 2018

      • bitmap
        before we put more queries on queen, I think we should try enabling https://github.com/metabrainz/docker-postgres/com… again
      • 2018-04-30 12020, 2018

      • zas
        more searchs ??
      • 2018-04-30 12027, 2018

      • bitmap
        it would fix conflict issues. we suspected it was increasing load, but I'm not so sure
      • 2018-04-30 12029, 2018

      • ruaok
        I think that was proven to not be a problem, right?
      • 2018-04-30 12046, 2018

      • ruaok
        I would say we should try that again and keep our eyes peeled for this going wrong.
      • 2018-04-30 12006, 2018

      • ruaok
        and with clear instructions how to undo that if it becomes a problem.
      • 2018-04-30 12015, 2018

      • zas
      • 2018-04-30 12023, 2018

      • zas
        bitmap: i was looking at different load balancing solutions for pg
      • 2018-04-30 12054, 2018

      • zas
        and it seems to me pgpool2 is the most common option, together 9.x features
      • 2018-04-30 12058, 2018

      • bitmap
        ruaok: ok, I'll add instructions to the pr yvanzo opened in syswiki
      • 2018-04-30 12038, 2018

      • ruaok
        great, thanks.
      • 2018-04-30 12045, 2018

      • zas
      • 2018-04-30 12033, 2018

      • bitmap
        I've heard of citus, didn't know they had a free/oss product
      • 2018-04-30 12007, 2018

      • bitmap
        looks very interesting
      • 2018-04-30 12033, 2018

      • ruaok
        promises a lot, looks very shiny. one point against. :)
      • 2018-04-30 12042, 2018

      • bitmap
        yeah, heh
      • 2018-04-30 12059, 2018

      • bitmap
        at least some companies I've never heard of use it https://www.citusdata.com/customers/
      • 2018-04-30 12024, 2018

      • zas
      • 2018-04-30 12031, 2018

      • ruaok
        let me know when I can point the updated search stuff to queen, bitmap
      • 2018-04-30 12025, 2018

      • bitmap
        ok, a few mins while I update things
      • 2018-04-30 12001, 2018

      • Slurpee has quit
      • 2018-04-30 12055, 2018

      • zas
        bitmap, yvanzo: i also noticed a thing with slowdowns: searching for locations is failing a lot, revealing sql requests are done each time, isn't this cached locally ? i mean if i look for a country named Spain when adding an artist, i expect it to be cached for a while (countries/towns are unlikely to change often, and i wasn't using direct search)
      • 2018-04-30 12000, 2018

      • bitmap
        the area containments should be cached, yes. I'll check why those are being queried so often once I update queen
      • 2018-04-30 12050, 2018

      • bitmap
        (I mean, we do actually cache them, but something's fishy)