#musicbrainz-devel

/

      • ruaok
        hey, whats new?
      • 2015-04-24 11434, 2015

      • zas
        nothing but the sun ;)
      • 2015-04-24 11412, 2015

      • zas
        btw, MB is slow as hell, and i keep getting 302s since yesterday, editing is very painful atm
      • 2015-04-24 11420, 2015

      • ruaok
        I know.
      • 2015-04-24 11429, 2015

      • zas
        i gues yes ;) what is going on ?
      • 2015-04-24 11433, 2015

      • ruaok
        our gateway is about to die.
      • 2015-04-24 11446, 2015

      • zas
        doesn't look good
      • 2015-04-24 11448, 2015

      • ruaok
        it needed rebooting once a month.
      • 2015-04-24 11454, 2015

      • ruaok
        then once a week.
      • 2015-04-24 11457, 2015

      • ruaok
        then once a day.
      • 2015-04-24 11406, 2015

      • ruaok
        now we're down to about 12 hours.
      • 2015-04-24 11412, 2015

      • zas
        what is the issue ?
      • 2015-04-24 11419, 2015

      • ruaok
        hardware failure.
      • 2015-04-24 11433, 2015

      • ruaok
        I have a replacement server nearly ready to go.
      • 2015-04-24 11440, 2015

      • ruaok
        that is what I am working right this second.
      • 2015-04-24 11451, 2015

      • zas
        ok good
      • 2015-04-24 11418, 2015

      • ruaok
        I'll reboot the gateway and see what happens.
      • 2015-04-24 11438, 2015

      • zas
        did you reboot it last night ?
      • 2015-04-24 11452, 2015

      • ruaok
        yes
      • 2015-04-24 11429, 2015

      • zas
        it was awful until some point (dont remember which hour), and ok later and this morning, then 302s came back today
      • 2015-04-24 11440, 2015

      • ruaok
        yep, that is it.
      • 2015-04-24 11455, 2015

      • ruaok
        once the reboot is done, you might have more luck using gtest.musicbrainz.org
      • 2015-04-24 11400, 2015

      • ruaok
        that is the new gateway
      • 2015-04-24 11408, 2015

      • zas
        i will test then
      • 2015-04-24 11427, 2015

      • ruaok
        reboot started.
      • 2015-04-24 11431, 2015

      • ruaok
        should be back in 2 minutes.
      • 2015-04-24 11457, 2015

      • zas
        barcelona should be great these days (spring in spain...)
      • 2015-04-24 11448, 2015

      • ruaok
        spring just showed up a few days ago. it was amazing. :)
      • 2015-04-24 11451, 2015

      • MBJenkins joined the channel
      • 2015-04-24 11423, 2015

      • Lotheric joined the channel
      • 2015-04-24 11402, 2015

      • rvedotrc
        ruaok: yo. Anything I can help with?
      • 2015-04-24 11418, 2015

      • ruaok
        heya.
      • 2015-04-24 11429, 2015

      • ruaok
        yes, so our gateway MTBF is now down to hours.
      • 2015-04-24 11440, 2015

      • rvedotrc
        :-/
      • 2015-04-24 11448, 2015

      • ruaok
        total and utter failure is about to happen, so you might say I'm a bit pressed. :)
      • 2015-04-24 11405, 2015

      • ruaok
        but I have: gtest.musicbrainz.org up and running.
      • 2015-04-24 11425, 2015

      • ruaok
        that proves that at least nginx, ssl and the firewall is in decent shape.
      • 2015-04-24 11432, 2015

      • ijabz2 joined the channel
      • 2015-04-24 11450, 2015

      • ruaok
        the most pressing thing I need to know is to how to deactivate and possible re-activate failover ips on carl.
      • 2015-04-24 11400, 2015

      • ruaok
        I'm considering the cut-over scenario.
      • 2015-04-24 11413, 2015

      • ruaok
        turn off one ip on carl, turn it on on ernie.
      • 2015-04-24 11418, 2015

      • ruaok
        test.
      • 2015-04-24 11425, 2015

      • rvedotrc
        ok. Carl is not part of a pair any more, right?
      • 2015-04-24 11428, 2015

      • ruaok
        revert or move on until complete.
      • 2015-04-24 11434, 2015

      • ruaok
        it still is.
      • 2015-04-24 11443, 2015

      • ruaok
        actually we don't have any pairs right now.
      • 2015-04-24 11456, 2015

      • ruaok
        we've got two singles. carl > old, ernie -> new
      • 2015-04-24 11444, 2015

      • rvedotrc
        So it has a configured peer, but the peer is dead, right?
      • 2015-04-24 11404, 2015

      • ruaok
        yes
      • 2015-04-24 11418, 2015

      • ruaok
        he got stuck in a closet somewhere.
      • 2015-04-24 11426, 2015

      • rvedotrc
        ok. bear with me while I remember something...
      • 2015-04-24 11431, 2015

      • ruaok
        k
      • 2015-04-24 11440, 2015

      • ruaok snickers lenny is in the closet. :)
      • 2015-04-24 11447, 2015

      • Lotheric joined the channel
      • 2015-04-24 11410, 2015

      • ruaok
        kepstin-laptop: I've added you to the bitbucket team.
      • 2015-04-24 11426, 2015

      • rvedotrc
        OK. IIRC, heartbeat is ok at maintaining the availability of resources, but there's no slick way of adding or removing a resource.
      • 2015-04-24 11431, 2015

      • rvedotrc
        such as an IP address.
      • 2015-04-24 11454, 2015

      • rvedotrc
        so IIRC, the way to do it is: edit the config to remove the ip (i.e. delete a line from haresources),
      • 2015-04-24 11419, 2015

      • rvedotrc
        then remove the IP by hand. One way of doing *that* is via the resource script,
      • 2015-04-24 11435, 2015

      • rvedotrc
        i.e. the same script that heartbeat itself uses to add/check/remove resources,
      • 2015-04-24 11441, 2015

      • rvedotrc
        namely /usr/lib/ocf/resource.d/heartbeat/IPaddr2
      • 2015-04-24 11451, 2015

      • rvedotrc
        but I can't quite remember the invocation syntax.
      • 2015-04-24 11400, 2015

      • rvedotrc
        Or, use /sbin/ip addr del, but be bloody careful.
      • 2015-04-24 11418, 2015

      • ruaok
        me bloody careful?
      • 2015-04-24 11419, 2015

      • ruaok
        houston, we have a problem.
      • 2015-04-24 11450, 2015

      • ruaok
        kepstin-laptop: sadly this is turning into an emergency.
      • 2015-04-24 11455, 2015

      • ruaok
        our gateway is super flaky.
      • 2015-04-24 11456, 2015

      • rvedotrc
        Specifically about the difference between ip addr del x.x.x.x/24 vs ip addr del x.x.x.x/32 vs ip addr del x.x.x.x
      • 2015-04-24 11414, 2015

      • rvedotrc
        One removes just that one IP address. One removes that, plus the others on the same subnet.
      • 2015-04-24 11420, 2015

      • rvedotrc
        Can't remember which is which.
      • 2015-04-24 11421, 2015

      • rvedotrc
        :-/
      • 2015-04-24 11426, 2015

      • ruaok is doomed
      • 2015-04-24 11431, 2015

      • kepstin-laptop
        ruaok, I just want to confirm - gateway machines are 12.04?
      • 2015-04-24 11403, 2015

      • ruaok
        Ubuntu 14.04.2 LTS
      • 2015-04-24 11405, 2015

      • kepstin-laptop
        to remove a specific ip, use the exact same ip/netmask as shown in the output of "ip addr show'
      • 2015-04-24 11427, 2015

      • ruaok
        ok, that helps. :)
      • 2015-04-24 11437, 2015

      • zas
        images i uploaded yesterday still not have associated json on CAA
      • 2015-04-24 11409, 2015

      • ruaok
        rvedotrc: if I need to get the IP back into haresrouces, what do I do beyond adding it to the file?
      • 2015-04-24 11418, 2015

      • ruaok
        zas: ping bitmap or ianmcorvidae about that.
      • 2015-04-24 11448, 2015

      • zas
        done i guess ;)
      • 2015-04-24 11402, 2015

      • ijabz2 joined the channel
      • 2015-04-24 11418, 2015

      • kepstin-laptop
        i think recent versions of 'ip addr del' print a warning if you don't include the netmask - not including netmask can do weird things. so always include netmask ;)
      • 2015-04-24 11424, 2015

      • ruaok
        kepstin-laptop: so when I see: "inet 72.29.167.148/28" in the "ip addr show" command, then use "ip addr del inet 72.29.167.148/28" ?
      • 2015-04-24 11430, 2015

      • kepstin-laptop
        yes
      • 2015-04-24 11433, 2015

      • ruaok
        got it.
      • 2015-04-24 11459, 2015

      • rvedotrc
        Like removal, in reverse: either work out how to run /usr/lib/ocf/resource.d/heartbeat/IPaddr2 by hand, or flip to the peer (doh), or restart heartbeat (downtime), or /sbin/ip addr add x.x.x.x/y dev foo
      • 2015-04-24 11455, 2015

      • rvedotrc
        (except that the resource script also does arp sending, so... manually adding with "ip addr add" is likely to result in a little gap before that IP is fully serviceable.)
      • 2015-04-24 11411, 2015

      • ruaok
        define little gap.
      • 2015-04-24 11429, 2015

      • rvedotrc
        30 seconds? /me guesses. ARP syncing.
      • 2015-04-24 11447, 2015

      • rvedotrc
        Time for a switch to work out that the IP has moved to a different port.
      • 2015-04-24 11417, 2015

      • rvedotrc wanders, back in a few min.
      • 2015-04-24 11433, 2015

      • kepstin-laptop
        you could probably run arping manually to speed that up
      • 2015-04-24 11420, 2015

      • kepstin-laptop
        'arping -I interface -U 72.29.167.148'
      • 2015-04-24 11422, 2015

      • ruaok
        what destination would I give it? new_ip?
      • 2015-04-24 11432, 2015

      • ruaok
        thx
      • 2015-04-24 11449, 2015

      • kepstin-laptop
        yeah, the ip address you just added is the destination
      • 2015-04-24 11453, 2015

      • ruaok
        rvedotrc, kepstin-laptop: https://gist.github.com/mayhem/023a9aa93f2ec39be5… is what I have so far.
      • 2015-04-24 11400, 2015

      • kepstin-laptop
        in the 're-add' section, I assume you mean *add* the ip to /usr/local/gateway/etc/ha.d/haresources ?
      • 2015-04-24 11418, 2015

      • ruaok
        reload
      • 2015-04-24 11403, 2015

      • ruaok
        kepstin-laptop: I'm currently cribbing from: http://www.tokiwinter.com/clustering-with-drbd-co…
      • 2015-04-24 11418, 2015

      • rvedotrc
        ruaok: no HUP.
      • 2015-04-24 11420, 2015

      • ruaok
        and I'm going to update the chef config to issue this command:
      • 2015-04-24 11422, 2015

      • ruaok
        crm configure property stonith-enabled=false
      • 2015-04-24 11429, 2015

      • ruaok
        rvedotrc: ok, good to know.
      • 2015-04-24 11441, 2015

      • ruaok
        and it suggests that I do:
      • 2015-04-24 11442, 2015

      • ruaok
        crm configure property no-quorum-policy=ignore
      • 2015-04-24 11453, 2015

      • ruaok
        kepstin-laptop: thoughts on that?
      • 2015-04-24 11456, 2015

      • rvedotrc
        IIRC, heartbeat only reads its resources file when it starts, stops, or swaps hosts.
      • 2015-04-24 11410, 2015

      • ruaok
        k
      • 2015-04-24 11410, 2015

      • rvedotrc
        (that, old, version of heartbeat anyway).
      • 2015-04-24 11424, 2015

      • kepstin-laptop
        ruaok, that's from the doc regarding a 2-node cluster, right?
      • 2015-04-24 11430, 2015

      • ruaok
        gist updated.
      • 2015-04-24 11433, 2015

      • ruaok
        kepstin-laptop: yes.
      • 2015-04-24 11453, 2015

      • ruaok
        kepstin-laptop: and I don't see any issue with issuing "crm configure property stonith-enabled=false" during a chef deploy. do you?
      • 2015-04-24 11451, 2015

      • kepstin-laptop
        ruaok, hmm. it'll probably do a configuration commit and epoch update, but I don't think it should hurt anything.
      • 2015-04-24 11411, 2015

      • ruaok
        k.
      • 2015-04-24 11438, 2015

      • ruaok
        the ha recipe has been added to do those steps at deploy.
      • 2015-04-24 11454, 2015

      • ruaok
        and that kinda sort makes it complete. I think.
      • 2015-04-24 11458, 2015

      • kepstin-laptop
        it might just be a no-op if it's already configured, i'm having some weirdness in my test server where the crm config updates aren't working :/
      • 2015-04-24 11453, 2015

      • kepstin-laptop
        you don't have a quorum section in your corosync.conf
      • 2015-04-24 11420, 2015

      • ruaok
        what do I need?
      • 2015-04-24 11437, 2015

      • kepstin-laptop
      • 2015-04-24 11420, 2015

      • ruaok
        add after totem?
      • 2015-04-24 11451, 2015

      • kepstin-laptop
        order of sections doesn't appear to matter. after totem is fine
      • 2015-04-24 11448, 2015

      • ruaok
        ok, added. waiting to hear confirmation before I commit.
      • 2015-04-24 11401, 2015

      • kepstin-laptop
        ok, that's close, but expected_votes should be 2 for a 2-node cluster
      • 2015-04-24 11438, 2015

      • kepstin-laptop
        i've updated the gist, and confirmed that with that config change corosync and pacemarker are starting now
      • 2015-04-24 11428, 2015

      • ruaok
        changes pushed.
      • 2015-04-24 11403, 2015

      • kepstin-laptop
        probably also want to add crm configure rsc_defaults resource-stickiness=100
      • 2015-04-24 11417, 2015

      • kepstin-laptop
        otherwise it might needlessly move the ips when a node comes back up
      • 2015-04-24 11420, 2015

      • ruaok
        added, verified, pushed.
      • 2015-04-24 11401, 2015

      • kepstin-laptop
        right now the cluster software configuration is looking alright to me.
      • 2015-04-24 11432, 2015

      • kepstin-laptop
        I'm just gonna try bringing up another node on my openstack and bouncing back and forth a couple times.
      • 2015-04-24 11414, 2015

      • ruaok
        awesome. :)
      • 2015-04-24 11421, 2015

      • kepstin-laptop
        bah, good time to find out the network on one of my openstack nodes is misconfigured
      • 2015-04-24 11429, 2015

      • kepstin-laptop fixes that, only took a minute :/
      • 2015-04-24 11440, 2015

      • kepstin-laptop
        openstack 'pause instance' feature is fun
      • 2015-04-24 11447, 2015

      • kepstin-laptop
        simulates the node just disappearing completely ;)
      • 2015-04-24 11428, 2015

      • kepstin-laptop
        hmm. it moved the ip back when the node came back, even with the stickiness setting.
      • 2015-04-24 11446, 2015

      • kepstin-laptop
        ah, worked on a reboot, i guess the pause acted strangely
      • 2015-04-24 11452, 2015

      • ruaok
        slick. :)
      • 2015-04-24 11401, 2015

      • kepstin-laptop
        ... let me just confirm that pacemaker restarts properly on reboot
      • 2015-04-24 11440, 2015

      • ruaok
        rvedotrc: another Q for you: how do I test if dnscache is working?
      • 2015-04-24 11411, 2015

      • ruaok
        kepstin-laptop: what is your favorite SSL checking service?
      • 2015-04-24 11403, 2015

      • kepstin-laptop
        ssllabs
      • 2015-04-24 11408, 2015

      • ruaok
        perfect.
      • 2015-04-24 11418, 2015

      • kepstin-laptop
        ok, for some reason pacemaker isn't set up to be started at boot
      • 2015-04-24 11430, 2015

      • kepstin-laptop
        "update-rc.d pacemaker defaults" should fix that, let me confirm
      • 2015-04-24 11453, 2015

      • kepstin-laptop
        yes, that fixes it
      • 2015-04-24 11459, 2015

      • kepstin-laptop
        add that to the chef deploy script