#metabrainz

/

      • yvanzo
        zas: I guess sir had to catch up with reindex messages after the move to pink, it should use less cpu now.
      • 2019-09-11 25424, 2019

      • zas
        yvanzo: ok, perhaps it needs to be nicer (or we need to enable cpu cgroup which requires specific kernel and/or grub options)
      • 2019-09-11 25445, 2019

      • zas
        imho you can just renice it to lower priority
      • 2019-09-11 25401, 2019

      • yvanzo
        right, I had to renice it on test vm already
      • 2019-09-11 25430, 2019

      • travis-ci joined the channel
      • 2019-09-11 25430, 2019

      • travis-ci
        metabrainz/picard#4916 (master - c21449f : Philipp Wolfer): The build passed.
      • 2019-09-11 25430, 2019

      • travis-ci
      • 2019-09-11 25430, 2019

      • travis-ci
      • 2019-09-11 25430, 2019

      • travis-ci has left the channel
      • 2019-09-11 25400, 2019

      • zas
        bitmap: can I take down queen?
      • 2019-09-11 25450, 2019

      • bitmap
        zas: +1 I don't have anything I need there
      • 2019-09-11 25451, 2019

      • ruaok now imagines zas as an american football player ready to tackle quee
      • 2019-09-11 25459, 2019

      • zas
        :D
      • 2019-09-11 25446, 2019

      • zas
        ruaok: I just cancelled queen and did various clean-up related to it (admin tools, nagios, dns, ...), plus I reinstalled a fresh linux on it (re-format)
      • 2019-09-11 25456, 2019

      • zas
        ruaok: so you can order floyd !
      • 2019-09-11 25434, 2019

      • zas
        now I'm going to prepare diner for my hungry monsters...
      • 2019-09-11 25458, 2019

      • CallerNo6 joined the channel
      • 2019-09-11 25458, 2019

      • CallerNo6 has quit
      • 2019-09-11 25458, 2019

      • CallerNo6 joined the channel
      • 2019-09-11 25432, 2019

      • ruaok
        ok to do that in the morning?
      • 2019-09-11 25409, 2019

      • yvanzo now imagines zas as a space pionneer feeding tamed aliens
      • 2019-09-11 25410, 2019

      • ruaok
        ohhh, almost like a vision of Calvin. :)
      • 2019-09-11 25429, 2019

      • zas
        ruaok: whenever you want
      • 2019-09-11 25403, 2019

      • zas
        yvanzo: those 👽 are untamed for sure...
      • 2019-09-11 25458, 2019

      • chaban joined the channel
      • 2019-09-11 25425, 2019

      • BrainzGit
        [musicbrainz-server] yvanzo closed pull request #1183 (master…i): Add support for expanding tag <i> to React https://github.com/metabrainz/musicbrainz-server/…
      • 2019-09-11 25402, 2019

      • oknozor joined the channel
      • 2019-09-11 25406, 2019

      • oknozor has quit
      • 2019-09-11 25442, 2019

      • djinni` has quit
      • 2019-09-11 25413, 2019

      • djinni` joined the channel
      • 2019-09-11 25419, 2019

      • chaban has quit
      • 2019-09-11 25427, 2019

      • chaban joined the channel
      • 2019-09-11 25442, 2019

      • chaban has quit
      • 2019-09-11 25451, 2019

      • chaban joined the channel
      • 2019-09-11 25427, 2019

      • zas
        yvanzo, bitmap : ping
      • 2019-09-11 25435, 2019

      • zas
        any idea about why website response time is so bad and number of 302s dropped a lot ? See https://stats.metabrainz.org/d/000000061/mbstats?…
      • 2019-09-11 25457, 2019

      • zas
        It started few hours ago
      • 2019-09-11 25429, 2019

      • zas
        and it seems it concerns only website (not webservice)
      • 2019-09-11 25435, 2019

      • zas
        number of queries to solr dropped significantly, it seems to work as it should though
      • 2019-09-11 25425, 2019

      • bitmap
        huh, not sure
      • 2019-09-11 25449, 2019

      • bitmap
        I'll check the containers but I don't know what'd cause 302s specifically to drop
      • 2019-09-11 25403, 2019

      • zas
        I think the drop in 302s explain the increase of the mean response time (because 302s are fast answers, lowering the mean a lot)
      • 2019-09-11 25440, 2019

      • zas
        I just checked loading times in browser, and main html documents are slow as usual, but everything seems ok
      • 2019-09-11 25456, 2019

      • zas
        also solr is receiving much less requests than usual
      • 2019-09-11 25421, 2019

      • bitmap
        did it start when you took queen down maybe?
      • 2019-09-11 25435, 2019

      • zas
        and response times from it are very low (each server handling only 20 reqs/s instead of 50)
      • 2019-09-11 25444, 2019

      • zas
        queen was removed 2 hours later
      • 2019-09-11 25450, 2019

      • zas
        I checked that ofc
      • 2019-09-11 25413, 2019

      • zas
        did anything change on mb website (deployement) ?
      • 2019-09-11 25422, 2019

      • bitmap
        I didn't deploy anything
      • 2019-09-11 25437, 2019

      • zas
        nginx traffic is the same as usual (network bandwidth-wise)
      • 2019-09-11 25419, 2019

      • zas
        there was a peak in 503s around 14:04 UTC then it started
      • 2019-09-11 25403, 2019

      • zas
        I'll reload openresty
      • 2019-09-11 25450, 2019

      • zas
        upstream traffic is the same as usual, and I don't see anything weird in response codes
      • 2019-09-11 25442, 2019

      • zas
        traffic from/to bowie didn't chnage
      • 2019-09-11 25425, 2019

      • zas
        there were changes on frank
      • 2019-09-11 25428, 2019

      • zas
      • 2019-09-11 25401, 2019

      • zas
        but no container was restarted on it or anything
      • 2019-09-11 25416, 2019

      • bitmap
        what were all the previous 302s from? http to https redirects?
      • 2019-09-11 25414, 2019

      • zas
        as is, no idea, we have to look at logs to know
      • 2019-09-11 25428, 2019

      • bitmap
        looking at the nginx logs it looks like the majority of 302s are /search pages, which may explain the connection with search
      • 2019-09-11 25441, 2019

      • bitmap
        but I don't know how those would redirect other than http->https
      • 2019-09-11 25446, 2019

      • bitmap
        so I assume it's that
      • 2019-09-11 25414, 2019

      • zas
        but we didn't change anything on gateways or solr (afaik)
      • 2019-09-11 25428, 2019

      • bitmap
        right
      • 2019-09-11 25439, 2019

      • zas
        can it be a significative change in incoming traffic?
      • 2019-09-11 25432, 2019

      • bitmap
        I mean if the numer of /search requests just dropped significantly I guess that'd explain the drop in 302s and solr traffic
      • 2019-09-11 25431, 2019

      • bitmap
        but idk why that would happen
      • 2019-09-11 25441, 2019

      • zas
        that's the question ;)
      • 2019-09-11 25401, 2019

      • zas
        before 14:00 utc we had 5k 302s per 2 minutes
      • 2019-09-11 25415, 2019

      • zas
        after it dropped to ~600
      • 2019-09-11 25426, 2019

      • zas
        number of 200s decreased too
      • 2019-09-11 25448, 2019

      • zas
        we lost 3k 200s per 2 minutes
      • 2019-09-11 25458, 2019

      • bitmap
        there are definitely a lot less 302s in the logs for 13 vs 14 vs 15 utc
      • 2019-09-11 25438, 2019

      • zas
        yes, and a lot less 200s too (more or less the same drop)
      • 2019-09-11 25409, 2019

      • bitmap
        ah
      • 2019-09-11 25421, 2019

      • bitmap
        so it's just more noticeable for 302s since there are less
      • 2019-09-11 25441, 2019

      • zas
        but we had a peak of 503s
      • 2019-09-11 25426, 2019

      • zas
        starting around 14:08, ending around 14:34
      • 2019-09-11 25448, 2019

      • bitmap
        yeah, and it didn't recover after that
      • 2019-09-11 25415, 2019

      • zas
        have a look to those, 8k per 2 minutes difference (usual ~2k, during this period > 10k)
      • 2019-09-11 25448, 2019

      • zas
        not sure if it's a cause or consequence though
      • 2019-09-11 25441, 2019

      • bitmap
        I'm going through each website container and seeing if there's anything weird in the logs, then restarting them
      • 2019-09-11 25429, 2019

      • zas
        k
      • 2019-09-11 25438, 2019

      • SothoTalKer has quit
      • 2019-09-11 25410, 2019

      • Gazooo joined the channel
      • 2019-09-11 25435, 2019

      • SothoTalKer joined the channel
      • 2019-09-11 25434, 2019

      • zas
        bitmap and I found what happened: a nasty distributed bot suddenly stopped hitting us, it was responsible of 2/3 of the traffic apparently ... that's a good and a bad news
      • 2019-09-11 25459, 2019

      • zas
        the bad one: we waste a lot of resources for nothing...
      • 2019-09-11 25431, 2019

      • zas
        the good one: legit users have now faster search responses
      • 2019-09-11 25422, 2019

      • zas
        conclusion: we need much stricter policy and better tools
      • 2019-09-11 25446, 2019

      • zas
        our investigation concerns only mb website, for this case. But it is very likely same shit hit our other fans....
      • 2019-09-11 25400, 2019

      • zas
        ruaok: ^^
      • 2019-09-11 25419, 2019

      • zas
      • 2019-09-11 25404, 2019

      • zas
        I'm too tired to continue on this now, cya later
      • 2019-09-11 25416, 2019

      • yvanzo
        !m zas
      • 2019-09-11 25416, 2019

      • BrainzBot
        You're doing good work, zas!
      • 2019-09-11 25447, 2019

      • ruaok
        zas: please add points to the agenda for the summit: https://wiki.musicbrainz.org/MusicBrainz_Summit/1…
      • 2019-09-11 25412, 2019

      • zas
        ruaok: I'll do, but tomorrow
      • 2019-09-11 25414, 2019

      • ruaok
        I'm quite keen to work out a post-react timeline for MB.
      • 2019-09-11 25455, 2019

      • zas
        this fake traffic was hiding a lot of things
      • 2019-09-11 25416, 2019

      • ruaok
        odd. normally it exposes things.
      • 2019-09-11 25440, 2019

      • zas
        in our case, nope, for example, impact of react changes was buried in noise
      • 2019-09-11 25449, 2019

      • ruaok
        OH!
      • 2019-09-11 25404, 2019

      • zas
        huge noise
      • 2019-09-11 25407, 2019

      • ruaok can't wait to hear more
      • 2019-09-11 25422, 2019

      • zas
        I'll do in-depth analysis and re-check things before any conclusion, we may have missed things
      • 2019-09-11 25438, 2019

      • zas
        I'm astonished we didn't see that sooner
      • 2019-09-11 25450, 2019

      • zas
        reminds me the QNAP shit
      • 2019-09-11 25407, 2019

      • ruaok
        does this also explain why we needed so many instances of the web server?
      • 2019-09-11 25411, 2019

      • zas
        yes
      • 2019-09-11 25420, 2019

      • zas
        and since a long time
      • 2019-09-11 25420, 2019

      • ruaok
        wow.
      • 2019-09-11 25430, 2019

      • zas
        but nothing is solved, it can start again
      • 2019-09-11 25440, 2019

      • zas
        we don't know why it suddenly stopped
      • 2019-09-11 25455, 2019

      • zas
        we don't even know why it existed
      • 2019-09-11 25450, 2019

      • ruaok
        sounds like something Confucius could say.
      • 2019-09-11 25422, 2019

      • zas
        for sure, I'm a bit angry: I should have detected this much sooner
      • 2019-09-11 25449, 2019

      • ruaok
        we need analytics.
      • 2019-09-11 25405, 2019

      • zas
        yes, and tools to block
      • 2019-09-11 25423, 2019

      • ruaok
        /ws/2.5
      • 2019-09-11 25452, 2019

      • ruaok
        add api keys to /ws/2 and throttle /ws/2 without API keys
      • 2019-09-11 25433, 2019

      • zas
        yup, but in this case it wouldn't help: it concerns mainly the website
      • 2019-09-11 25400, 2019

      • ruaok nods
      • 2019-09-11 25416, 2019

      • ruaok
        now, wtf are the motivations for what happend?
      • 2019-09-11 25439, 2019

      • zas
        frankly: no idea
      • 2019-09-11 25413, 2019

      • zas
        but for sure we need to fight back, because it costs us a lot
      • 2019-09-11 25423, 2019

      • zas
        wasted resources
      • 2019-09-11 25456, 2019

      • zas
        for now, I need to relax and sleep
      • 2019-09-11 25401, 2019

      • zas
        cya tomorrow
      • 2019-09-11 25407, 2019

      • ruaok
        !m zas
      • 2019-09-11 25407, 2019

      • BrainzBot
        You're doing good work, zas!
      • 2019-09-11 25411, 2019

      • ruaok
        sleep well!
      • 2019-09-11 25441, 2019

      • zas
        thanks :)
      • 2019-09-11 25433, 2019

      • SothoTalKer
        maybe some bot mirroring the site
      • 2019-09-11 25431, 2019

      • D4RK-PH0ENiX has quit
      • 2019-09-11 25409, 2019

      • D4RK-PH0ENiX joined the channel
      • 2019-09-11 25448, 2019

      • thomasross_ joined the channel
      • 2019-09-11 25448, 2019

      • thomasross has quit
      • 2019-09-11 25448, 2019

      • thomasross_ is now known as thomasross
      • 2019-09-11 25431, 2019

      • melodee_ joined the channel
      • 2019-09-11 25404, 2019

      • melodee has quit
      • 2019-09-11 25418, 2019

      • eharris has quit
      • 2019-09-11 25404, 2019

      • Besnik_b has quit
      • 2019-09-11 25455, 2019

      • eharris joined the channel
      • 2019-09-11 25437, 2019

      • Lotheric_ is now known as Lotheric
      • 2019-09-11 25420, 2019

      • Besnik_b joined the channel
      • 2019-09-11 25429, 2019

      • Besnik_b has quit
      • 2019-09-11 25421, 2019

      • Besnik_b joined the channel