#metabrainz

/

      • yvanzo
        zas: there is no debt either
      • 2018-04-30 12015, 2018

      • yvanzo
        (it is not due to the fact it didn’t run)
      • 2018-04-30 12054, 2018

      • yvanzo
        for instance, replication cannot even connect to the DB.
      • 2018-04-30 12017, 2018

      • zas
        ok, so we have no clue about what is going on, right ?
      • 2018-04-30 12030, 2018

      • yvanzo
        and subscriptions have run to then end on yesterday
      • 2018-04-30 12046, 2018

      • yvanzo
        exactly, I’m not even sure MB is the cause of it.
      • 2018-04-30 12033, 2018

      • zas
        my guess is that something in pg went bad
      • 2018-04-30 12052, 2018

      • zas
        i see nothing that could be an "external" cause
      • 2018-04-30 12012, 2018

      • yvanzo
        reports are taking X times more than usual to complete, with X large enough
      • 2018-04-30 12013, 2018

      • zas
        sure something triggered it, but hard to know what
      • 2018-04-30 12048, 2018

      • zas
        ok, let's do that: put everything in maintenance, and restart pg
      • 2018-04-30 12058, 2018

      • ruaok
        yes, please.
      • 2018-04-30 12012, 2018

      • ruaok
        after that I would like to take some mitigation steps, but first a restart.
      • 2018-04-30 12017, 2018

      • ruaok
        what is needed to do this?
      • 2018-04-30 12027, 2018

      • ruaok
        do we need to do prep besides tweeting?
      • 2018-04-30 12040, 2018

      • zas
        ruaok: can you tweet about maintenance ? i'll toggle a value in docker server config
      • 2018-04-30 12052, 2018

      • zas
        i added it recently, untested in prod yet though
      • 2018-04-30 12053, 2018

      • yvanzo
        I can stop mb cron and sentry already
      • 2018-04-30 12039, 2018

      • ruaok
        on it.
      • 2018-04-30 12054, 2018

      • zas
        btw, bowie needs a reboot, but i want to test a pg restart first
      • 2018-04-30 12011, 2018

      • zas
        let's not add random issues on this one
      • 2018-04-30 12018, 2018

      • yvanzo
        subscriptions are only 25% processed
      • 2018-04-30 12026, 2018

      • zas
        yes, it is uber slow
      • 2018-04-30 12032, 2018

      • zas
        can it resume ?
      • 2018-04-30 12059, 2018

      • ruaok
        twatted.
      • 2018-04-30 12004, 2018

      • ruaok
        we just run tomorrow.
      • 2018-04-30 12009, 2018

      • ruaok
        subscriptions are not that important.
      • 2018-04-30 12016, 2018

      • yvanzo
        no, but tomorrow’s run will catch it.
      • 2018-04-30 12045, 2018

      • zas
        ok, i'll push the change to docker server config, to put everything down during bowie's maintenance
      • 2018-04-30 12009, 2018

      • zas
        ok done
      • 2018-04-30 12023, 2018

      • zas
        yvanzo: restart the pg container on bowie please
      • 2018-04-30 12024, 2018

      • ruaok
        we need nagios to read our twitter feed.
      • 2018-04-30 12034, 2018

      • ruaok
        calmate, nagios!
      • 2018-04-30 12052, 2018

      • zas
        yvanzo: ?
      • 2018-04-30 12019, 2018

      • yvanzo
        yep
      • 2018-04-30 12033, 2018

      • yvanzo
        done
      • 2018-04-30 12016, 2018

      • zas
        ok, i let it run a bit before removing maintenance mode
      • 2018-04-30 12012, 2018

      • zas
        removing alldown, it should happen within next 2 minutes
      • 2018-04-30 12034, 2018

      • zas
        we'll soon know if it changes anything
      • 2018-04-30 12007, 2018

      • zas
        services are back
      • 2018-04-30 12018, 2018

      • ruaok
        cpu at 70%.
      • 2018-04-30 12021, 2018

      • zas
        ruaok: nagios happy again ;)
      • 2018-04-30 12039, 2018

      • ruaok
        no dice, that didn't change anything.
      • 2018-04-30 12054, 2018

      • ruaok
        what is the status of the DB on queen?
      • 2018-04-30 12057, 2018

      • ruaok
        yvanzo: do you know?
      • 2018-04-30 12058, 2018

      • zas
        i'd not conclude anything yet, but doesn't look too promising
      • 2018-04-30 12045, 2018

      • yvanzo
        ruaok: reloading config files
      • 2018-04-30 12059, 2018

      • ruaok
        but it should be operational, in theory?
      • 2018-04-30 12009, 2018

      • yvanzo
        yes
      • 2018-04-30 12013, 2018

      • ruaok
        and we simply do not have load balancing setup?
      • 2018-04-30 12023, 2018

      • ruaok
        is any DB traffic going there right now?
      • 2018-04-30 12030, 2018

      • yvanzo
        CAA redirect is already using it again
      • 2018-04-30 12059, 2018

      • ruaok
        ok. what other DB use things could use the read-only mirror?
      • 2018-04-30 12010, 2018

      • ruaok
        search index creation. subscription emails.
      • 2018-04-30 12015, 2018

      • yvanzo
        no one else
      • 2018-04-30 12032, 2018

      • yvanzo
        subscription emails have been aborted
      • 2018-04-30 12056, 2018

      • ruaok
        I thinking about manual balancing to buy us some time before we get a proper load balancer goin.
      • 2018-04-30 12006, 2018

      • ruaok
        what else can be move to queen ASAP?
      • 2018-04-30 12031, 2018

      • ruaok
        but queen is reliably replicating, right?
      • 2018-04-30 12054, 2018

      • yvanzo
        there are requests to bowie pg with musicbrainz_ro pg user
      • 2018-04-30 12021, 2018

      • ruaok
        do you know how many and who they are?
      • 2018-04-30 12013, 2018

      • yvanzo
        looking for it
      • 2018-04-30 12052, 2018

      • yvanzo
        roughly 15%
      • 2018-04-30 12020, 2018

      • ruaok
        15% of relief for PG sounds very useful.
      • 2018-04-30 12029, 2018

      • ruaok
        let's see if we can move this traffic to queen.
      • 2018-04-30 12032, 2018

      • yvanzo
        I’m not sure what matches these requests :/
      • 2018-04-30 12005, 2018

      • yvanzo
        Or even if they are meaningful.
      • 2018-04-30 12020, 2018

      • ruaok
        might just be search.
      • 2018-04-30 12011, 2018

      • zas
        queen replication is ok, right ? we need to put everything in maintenance again and reboot bowie (sec updates)
      • 2018-04-30 12032, 2018

      • zas
        yvanzo, ruaok : ok ?
      • 2018-04-30 12033, 2018

      • yvanzo
        yep, queen is fine
      • 2018-04-30 12002, 2018

      • ruaok
        I wish we would've done the restart at the same time.
      • 2018-04-30 12017, 2018

      • ruaok
        can we redirect traffic to queen for a bit?
      • 2018-04-30 12020, 2018

      • zas
        i prefer not, reboot may cause its own issues...
      • 2018-04-30 12042, 2018

      • ruaok
        or should we just do a reboot and then start working on balancing the load?
      • 2018-04-30 12022, 2018

      • zas
        the reboot is required by sec upgrades, but it is unrelated to the current load issue
      • 2018-04-30 12028, 2018

      • ruaok
        understood.
      • 2018-04-30 12046, 2018

      • yvanzo
        agreed
      • 2018-04-30 12005, 2018

      • zas
        about targetting queen instead of bowie, is there an easy mean yet ?
      • 2018-04-30 12022, 2018

      • ruaok
        search.
      • 2018-04-30 12027, 2018

      • ruaok
        I'm working on a PR for that.
      • 2018-04-30 12033, 2018

      • zas
        ah just for search, ok
      • 2018-04-30 12035, 2018

      • ruaok
        I'm open for suggestions on other RO traffic.
      • 2018-04-30 12040, 2018

      • ruaok
        and moving it over.
      • 2018-04-30 12049, 2018

      • zas
        i have a bunch of ideas, but they all require to be able to configure mb containers to select a db server easily, i asked bitmap about it, and he told me he'll work on something
      • 2018-04-30 12032, 2018

      • yvanzo
        indeed, there is no easy way to do it right now
      • 2018-04-30 12000, 2018

      • zas
        this is where we need pgpool or smt
      • 2018-04-30 12012, 2018

      • ruaok
        yep.
      • 2018-04-30 12044, 2018

      • zas
        let's decide what to do during the meeting, this issue is there since too long and cause too much hassle
      • 2018-04-30 12000, 2018

      • ruaok
        let's not. this topic is too big and detailed for the meeting.
      • 2018-04-30 12005, 2018

      • ruaok
        first, let's do the reboot.
      • 2018-04-30 12011, 2018

      • zas
        ok
      • 2018-04-30 12012, 2018

      • ruaok
        hard down time. starting in 2 minutes. ready?
      • 2018-04-30 12019, 2018

      • zas
        yup
      • 2018-04-30 12026, 2018

      • zas
        and ending when ...
      • 2018-04-30 12056, 2018

      • zas
        ruaok: tell me when tweeted
      • 2018-04-30 12015, 2018

      • ruaok
        done.
      • 2018-04-30 12027, 2018

      • ruaok
        proceed when ready.
      • 2018-04-30 12038, 2018

      • zas
        k, alldown set, it will happen soon
      • 2018-04-30 12057, 2018

      • zas
        rebooting
      • 2018-04-30 12017, 2018

      • ruaok
        shhh nagios. down boy.
      • 2018-04-30 12030, 2018

      • zas
        at least, nagios is working ;)
      • 2018-04-30 12043, 2018

      • ruaok
        lol
      • 2018-04-30 12013, 2018

      • zas
        reboot in progress, time to sacrifice a black chicken
      • 2018-04-30 12030, 2018

      • ruaok
        why a black chicken?
      • 2018-04-30 12039, 2018

      • ruaok would prefer a rainbow chicken
      • 2018-04-30 12056, 2018

      • zas
        well, it works better one said ;)
      • 2018-04-30 12059, 2018

      • yvanzo
        poudre verte anyone?
      • 2018-04-30 12005, 2018

      • zas
        yuppppi, ping is back
      • 2018-04-30 12055, 2018

      • zas
        removing alldown
      • 2018-04-30 12047, 2018

      • ruaok
        samj1912: ping
      • 2018-04-30 12000, 2018

      • yvanzo
        zas: smt is still wrong with mb
      • 2018-04-30 12012, 2018

      • zas
        yes, 502s
      • 2018-04-30 12044, 2018

      • yvanzo
        “No such database: musicbrainz_db”
      • 2018-04-30 12051, 2018

      • ruaok
        uh oh
      • 2018-04-30 12003, 2018

      • zas
        hmmm
      • 2018-04-30 12013, 2018

      • yvanzo
        database should be 'musicbrainz'
      • 2018-04-30 12051, 2018

      • yvanzo
        ? Or is it containers port the issue?
      • 2018-04-30 12033, 2018

      • zas
        it appeared after a reboot, nothing should have changed inside pg container
      • 2018-04-30 12040, 2018

      • yvanzo
        zas: yep, it is containers port
      • 2018-04-30 12005, 2018

      • yvanzo
        port=65401 in MB error msg, whereas pg runs on 65400 as usual
      • 2018-04-30 12043, 2018

      • zas
        why is this happening ?
      • 2018-04-30 12055, 2018

      • yvanzo
        I will try restart containers
      • 2018-04-30 12011, 2018

      • zas
        docker picked another port ??
      • 2018-04-30 12050, 2018

      • zas
      • 2018-04-30 12015, 2018

      • zas
        from docker inspect postgres-master on bowie
      • 2018-04-30 12025, 2018

      • yvanzo
        It seems to be running fine on bowie, w/ 65401 port in netstat
      • 2018-04-30 12042, 2018

      • ruaok
        is pg bouncer up?
      • 2018-04-30 12037, 2018

      • yvanzo
        no
      • 2018-04-30 12023, 2018

      • yvanzo
        sorry, it is
      • 2018-04-30 12019, 2018

      • zas
        mb containers are trying to connect to 65401 instead of 65400 ?
      • 2018-04-30 12035, 2018

      • zas
        yvanzo: did you restart mb containers yet ?
      • 2018-04-30 12036, 2018

      • yvanzo
        zas: 65400 is local port and is available from bowie
      • 2018-04-30 12049, 2018

      • yvanzo
        I restarted beta containers, without change
      • 2018-04-30 12025, 2018

      • zas
        08006 DBI connect('dbname=musicbrainz_db;host=10.2.2.24;port=65401','musicbrainz',...) failed: ERROR: No such database: musicbrainz_db
      • 2018-04-30 12028, 2018

      • yvanzo
        Containers are trying to connect 65401 from other servers and I guess it is legit, but they cannot find it
      • 2018-04-30 12044, 2018

      • zas
        port is ok ? db name isn't ?
      • 2018-04-30 12024, 2018

      • zas
        that's from cage mb website container, after a restart
      • 2018-04-30 12027, 2018

      • yvanzo
        both are ok
      • 2018-04-30 12051, 2018

      • ruaok
        connecting to 65400 I can see musicbrainz_db. connecting to 65401 I CANNOT see musicbrainz_db.
      • 2018-04-30 12000, 2018

      • ruaok
        01 is pg bouncer?
      • 2018-04-30 12011, 2018

      • yvanzo
        yes
      • 2018-04-30 12029, 2018

      • ruaok
        have you restarted pg-bouncer? if not, please do that.
      • 2018-04-30 12037, 2018

      • zas
        so pg bouncer is failing
      • 2018-04-30 12025, 2018

      • zas
        what is the process to restart pg-bouncer atm ?
      • 2018-04-30 12050, 2018

      • kartikeyaSh_ircc joined the channel
      • 2018-04-30 12008, 2018

      • ruaok
        yvanzo: ?
      • 2018-04-30 12027, 2018

      • yvanzo
        restarting postgres from the container
      • 2018-04-30 12044, 2018

      • ruaok
        why? seemed a pg bouncer issue.
      • 2018-04-30 12047, 2018

      • Slurpee joined the channel