#metabrainz

/

      • D4RK-PH0ENiX joined the channel
      • 2018-10-12 28545, 2018

      • ruaok is in 515
      • 2018-10-12 28543, 2018

      • yvanzo at 461, not even an HTTP status code
      • 2018-10-12 28559, 2018

      • Leo_Verto_ joined the channel
      • 2018-10-12 28548, 2018

      • Leo_Verto has quit
      • 2018-10-12 28548, 2018

      • Leo_Verto_ is now known as Leo_Verto
      • 2018-10-12 28504, 2018

      • Freso checked in to his hotel now
      • 2018-10-12 28508, 2018

      • charley joined the channel
      • 2018-10-12 28502, 2018

      • SothoTalKer has quit
      • 2018-10-12 28511, 2018

      • SothoTalKer joined the channel
      • 2018-10-12 28538, 2018

      • outsidecontext joined the channel
      • 2018-10-12 28548, 2018

      • outsidecontext has quit
      • 2018-10-12 28502, 2018

      • outsidecontext joined the channel
      • 2018-10-12 28559, 2018

      • outsidecontext has quit
      • 2018-10-12 28512, 2018

      • outsidecontext joined the channel
      • 2018-10-12 28524, 2018

      • D4RK-PH0ENiX has quit
      • 2018-10-12 28538, 2018

      • UmkaDK joined the channel
      • 2018-10-12 28514, 2018

      • outsidecontext has quit
      • 2018-10-12 28536, 2018

      • outsidecontext joined the channel
      • 2018-10-12 28539, 2018

      • outsidecontext has quit
      • 2018-10-12 28510, 2018

      • outsidecontext joined the channel
      • 2018-10-12 28559, 2018

      • hibiscuskazeneko joined the channel
      • 2018-10-12 28508, 2018

      • hibiscuskazeneko
        Apparently Solr is out of commission
      • 2018-10-12 28534, 2018

      • hibiscuskazeneko
        nvm, must have been a momentary hiccup
      • 2018-10-12 28545, 2018

      • zas
        hibiscuskazeneko: in fact, there was issue awith mb-solr-6 node
      • 2018-10-12 28510, 2018

      • zas
        i had to restart solr process
      • 2018-10-12 28548, 2018

      • zas
        the server was shutdown at 6:33 UTC, and came back online 10 minutes later, i suppose that's due to an Hetzner maintenance task, but i can't find anything about it, so i asked them. The issue was caused by solr process which didn't recover well, and for some reason, was returning 500 errors, i'll have to tune haproxy health checks to take this case in account
      • 2018-10-12 28549, 2018

      • zas
        basically it was online, but generating errors, which disappeared after a simple restart of the process, which joined again the solr cloud without issue.
      • 2018-10-12 28508, 2018

      • hibiscuskazeneko has quit
      • 2018-10-12 28511, 2018

      • zas
        solr was erroring with "o.a.s.s.SolrDispatchFilter Error processing the request. CoreContainer is either not initialized or shutting down." on every request, filled the logs with that, the exact cause is yet to be determined.
      • 2018-10-12 28507, 2018

      • hibiscuskazeneko joined the channel
      • 2018-10-12 28530, 2018

      • zas
        ok, i modified health checks so haproxy checks on actual query, and if it ends in anything but 200 declares the node as unhealthy. It should prevent sending errors to users (but it doesn't address the core issue, which is undetermined yet)
      • 2018-10-12 28555, 2018

      • zas
        another issue, may be related, sir-beta container on queen spawn a lot of python -m sir amqp_watch processes
      • 2018-10-12 28541, 2018

      • zas
        samj1912: is sir still 'beta' and why is a solr container still running on queen ? https://github.com/metabrainz/docker-server-confi… ?
      • 2018-10-12 28534, 2018

      • zas
        ok, solr container removed from queen, and sir is still beta
      • 2018-10-12 28506, 2018

      • zas
        i had an answer from hetzner, it was an urgent maintenance operation: https://www.hetzner-status.de/en.html#10432
      • 2018-10-12 28526, 2018

      • zas
        only solr-6 node was impacted
      • 2018-10-12 28526, 2018

      • rdswift has quit
      • 2018-10-12 28508, 2018

      • code_master5 joined the channel
      • 2018-10-12 28514, 2018

      • D4RK-PH0ENiX joined the channel
      • 2018-10-12 28528, 2018

      • code_master5
        hey! I'm looking forward to contributing to CB. I'm done with setting up the server (hopefully!). I can't change the language in the test server. Why?
      • 2018-10-12 28508, 2018

      • hibiscuskazeneko has quit
      • 2018-10-12 28523, 2018

      • hibiscuskazeneko joined the channel
      • 2018-10-12 28509, 2018

      • outsidecontext has left the channel
      • 2018-10-12 28531, 2018

      • outsidecontext joined the channel
      • 2018-10-12 28509, 2018

      • kartikeyaSh joined the channel
      • 2018-10-12 28516, 2018

      • Freso
        code_master5: You're in the right place. Stick around and someone will hopefully be able to get back to you soon. :)
      • 2018-10-12 28556, 2018

      • kartikeyaSh
      • 2018-10-12 28534, 2018

      • kartikeyaSh
        it has something about adding support for new languages
      • 2018-10-12 28528, 2018

      • hibiscuskazeneko has quit
      • 2018-10-12 28550, 2018

      • zas
        ignore telegram alerts, just restarted grafana server
      • 2018-10-12 28532, 2018

      • ruaok
        mooooin.
      • 2018-10-12 28559, 2018

      • ruaok
        hmmm. starbucks coffee to get the day started. what could possibly go 'rong?
      • 2018-10-12 28513, 2018

      • zas
        good morning SF
      • 2018-10-12 28504, 2018

      • rdswift joined the channel
      • 2018-10-12 28535, 2018

      • Freso
        Morning. :)
      • 2018-10-12 28515, 2018

      • ruaok
        zas: what server was taken out by hetzner?
      • 2018-10-12 28538, 2018

      • zas
        solr-cloud-6 VM, but it was somehow planned, i didn't get the info, because Robot notices for cloud machines wasn't checked (i subscribed before hetzner cloud was a thing)
      • 2018-10-12 28558, 2018

      • zas
        that's the drawback i was talking about during summit
      • 2018-10-12 28506, 2018

      • ruaok
        are you getting the notifications now?
      • 2018-10-12 28517, 2018

      • zas
        i should
      • 2018-10-12 28546, 2018

      • zas
        for some reason, solr on this node didn't restarted properly, i couldn't reproduce the issue (i tried hard)
      • 2018-10-12 28536, 2018

      • ruaok
        I noticed a message about solr not being available on #musicbrainz, but that it fixed itself.
      • 2018-10-12 28537, 2018

      • zas
        but the core problem was haproxy not taking out this node, because it was healthy enough according to its checks
      • 2018-10-12 28543, 2018

      • ruaok
        I guess those two were related.
      • 2018-10-12 28553, 2018

      • ruaok
        ah!
      • 2018-10-12 28502, 2018

      • zas
        25% of search requests failed for 2 hours
      • 2018-10-12 28531, 2018

      • zas
        i fixed health checks, everything is back to normal
      • 2018-10-12 28558, 2018

      • ruaok
        what was wrong with the checks?
      • 2018-10-12 28507, 2018

      • ruaok
        alive, but not answering queries?
      • 2018-10-12 28511, 2018

      • zas
        well, solr was answering.... 500
      • 2018-10-12 28520, 2018

      • zas
        yup
      • 2018-10-12 28538, 2018

      • zas
        i changed it to check 200 on actual query
      • 2018-10-12 28557, 2018

      • ruaok
        I didn't see a nagion notification for it. are those nodes nagios monitored?
      • 2018-10-12 28528, 2018

      • zas
        nope
      • 2018-10-12 28553, 2018

      • ruaok
        why not?
      • 2018-10-12 28538, 2018

      • zas
        not done yet
      • 2018-10-12 28548, 2018

      • zas
        and nagios checks are a pain to maintain
      • 2018-10-12 28517, 2018

      • zas
        plus this problem wouldn't have been noticed by a standard nagios check
      • 2018-10-12 28507, 2018

      • zas
        alerts based on grafana/influxdb are there to fill gaps left by nagios
      • 2018-10-12 28506, 2018

      • ruaok
        I'm glad we don't have to pay for hetzners chaos monkey services.
      • 2018-10-12 28516, 2018

      • ruaok
        but it sure helps us find SPoFs. :)
      • 2018-10-12 28522, 2018

      • zas
        yup :)
      • 2018-10-12 28546, 2018

      • zas
        btw, there were 100+ alerts on telegram
      • 2018-10-12 28500, 2018

      • zas
        i noticed as soon i woke up
      • 2018-10-12 28513, 2018

      • ruaok
        ah, yes that massive spam firehose that monkey and I ignore.
      • 2018-10-12 28555, 2018

      • ruaok
        do we need to make yet another one that filters 99% and just sends a "yo, shit is one fire" go look at the others one?
      • 2018-10-12 28533, 2018

      • ruaok
        sheraton peeps: stuffing two coffee bags into the machine in one go makes a reliable cup of coffee.
      • 2018-10-12 28508, 2018

      • ruaok
        also sunnyvale peeps: I am planning on getting a lyft to breakfast, a weed shop and then trader joes, the eccentric food shop, then back to the hotel. anyone game?
      • 2018-10-12 28544, 2018

      • outsidecontext has quit
      • 2018-10-12 28500, 2018

      • zas
        wait me, booking a flight ;)
      • 2018-10-12 28526, 2018

      • zas
        ruaok: alert system in grafana is quite new, but it improves on each release, i guess things will improve over the time regarding spammy alerts (and i refine them along time too)
      • 2018-10-12 28500, 2018

      • zas
        it has no "flapping" status like nagios has yet
      • 2018-10-12 28556, 2018

      • Freso
        ruaok: I'm up for going to Trader Joe's. Just finishing up breakfast and have no interest in weed shops though. 🙈
      • 2018-10-12 28517, 2018

      • ruaok
        can't pick and choose.
      • 2018-10-12 28535, 2018

      • ruaok
        the new route is in and out burger, trader joes, weed, back to the hotel.
      • 2018-10-12 28514, 2018

      • iliekcomputers
        When? I just got up, can join in a half hour.
      • 2018-10-12 28547, 2018

      • ruaok
        10:45 in the lobby then. :)
      • 2018-10-12 28527, 2018

      • iliekcomputers
        Woo, cool. :)
      • 2018-10-12 28512, 2018

      • yvanzo
        will be there too :)
      • 2018-10-12 28503, 2018

      • Freso
        If you're willing to swing by Aloft and pick me up, I'm down as well. Otherwise, I'll see you tonight.
      • 2018-10-12 28546, 2018

      • iliekcomputers
        I'm in the lobby
      • 2018-10-12 28526, 2018

      • UmkaDK_ joined the channel
      • 2018-10-12 28554, 2018

      • UmkaDK has quit
      • 2018-10-12 28559, 2018

      • ruaok
        If you really want to Freso . Two places are of no interest to you, but hey.
      • 2018-10-12 28526, 2018

      • ruaok
        If you really want to, we can swing by in about 10 mins.
      • 2018-10-12 28531, 2018

      • Freso
        I'd like to meet up with you guys and hang out, activity in question is secondary. I'll be outside Aloft in 10. :)
      • 2018-10-12 28553, 2018

      • ruaok
        K
      • 2018-10-12 28545, 2018

      • UmkaDK_ has quit
      • 2018-10-12 28533, 2018

      • code_master5 has quit
      • 2018-10-12 28552, 2018

      • kartikeyaSh has quit
      • 2018-10-12 28517, 2018

      • adhawkins has quit
      • 2018-10-12 28549, 2018

      • adhawkins joined the channel
      • 2018-10-12 28552, 2018

      • charley has quit
      • 2018-10-12 28546, 2018

      • hibiscuskazeneko joined the channel
      • 2018-10-12 28546, 2018

      • hibiscuskazeneko has quit