#metabrainz

/

      • samj1912
        oh sorry
      • 2018-06-15 16618, 2018

      • samj1912
        running from another node
      • 2018-06-15 16621, 2018

      • samj1912
        wait let me stop that too
      • 2018-06-15 16623, 2018

      • zas
        we reached 400 ops
      • 2018-06-15 16638, 2018

      • samj1912
        that was from the gcloud machine
      • 2018-06-15 16648, 2018

      • zas
        now let's see if solr cloud recovers
      • 2018-06-15 16652, 2018

      • ruaok
        I'm looking at the that page, but I can't make out which graph, zas.
      • 2018-06-15 16656, 2018

      • samj1912
      • 2018-06-15 16612, 2018

      • samj1912
        from the diff. test
      • 2018-06-15 16615, 2018

      • samj1912
        concurrency was 500
      • 2018-06-15 16618, 2018

      • zas
      • 2018-06-15 16635, 2018

      • zas
        samj1912: ok, enough, let it cool down
      • 2018-06-15 16652, 2018

      • zas
        we are above our target of 270 req/s
      • 2018-06-15 16656, 2018

      • samj1912
        yay :D
      • 2018-06-15 16618, 2018

      • samj1912
        this should improve with live logs
      • 2018-06-15 16624, 2018

      • zas
        and if we need more, we can just add a node...
      • 2018-06-15 16624, 2018

      • samj1912
        more cache hits
      • 2018-06-15 16651, 2018

      • samj1912
        what was the worse latency during stress?
      • 2018-06-15 16652, 2018

      • zas
        cx11 for haproxy are perfect, cpu load was still low
      • 2018-06-15 16659, 2018

      • zas
        and memory is enough
      • 2018-06-15 16616, 2018

      • zas
        dunno, i'll analyze graphs in a few
      • 2018-06-15 16632, 2018

      • zas
        we had a weird peak on solr3
      • 2018-06-15 16642, 2018

      • samj1912
        ?
      • 2018-06-15 16659, 2018

      • zas
        but worse figure is ~500ms
      • 2018-06-15 16606, 2018

      • samj1912
        nice
      • 2018-06-15 16615, 2018

      • samj1912
        that too when it was HAMMERED
      • 2018-06-15 16615, 2018

      • zas
        which is acceptable for that load ;)
      • 2018-06-15 16653, 2018

      • zas
        anyway, we are good, now we have to secure this stuff
      • 2018-06-15 16602, 2018

      • ruaok
        phew.
      • 2018-06-15 16608, 2018

      • ruaok
        sign off from the boss. wooo!
      • 2018-06-15 16612, 2018

      • zas
        please document everything, i'll update docs with last changes i made to haproxy conf
      • 2018-06-15 16643, 2018

      • zas
        but first, cofffffeee
      • 2018-06-15 16655, 2018

      • yvanzo
        Leo__Verto: not yet. Note that even email domains cannot go public, except for email hosting domains.
      • 2018-06-15 16605, 2018

      • Leo__Verto
        is it okay if I filter those manually or should I find a list and have the script filter by that?
      • 2018-06-15 16619, 2018

      • zas
        samj1912: can you target sir at solr-cloud.musicbrainz.org ?
      • 2018-06-15 16630, 2018

      • zas
        i set it up to forward requests to solr1 only for now, i want to see if it works as expected (on POST)
      • 2018-06-15 16653, 2018

      • samj1912
        zas: instead I will try posting from my local machine
      • 2018-06-15 16605, 2018

      • samj1912
        and just annotations
      • 2018-06-15 16607, 2018

      • zas
        whatever ;)
      • 2018-06-15 16602, 2018

      • zas
        all valid requests should start with /solr/ path right?
      • 2018-06-15 16645, 2018

      • yvanzo
        zas: your change to test mb json is perfect, the same will do for beta.
      • 2018-06-15 16621, 2018

      • samj1912
        zas: yes
      • 2018-06-15 16628, 2018

      • samj1912
        zas: done
      • 2018-06-15 16639, 2018

      • samj1912
        there should have been 4 requests
      • 2018-06-15 16652, 2018

      • zas
        it works, all went to solr1
      • 2018-06-15 16608, 2018

      • samj1912
        zas: what happens when solr1 is down?
      • 2018-06-15 16624, 2018

      • ruaok sends a contract to QNAP
      • 2018-06-15 16628, 2018

      • ruaok
        I never thought that would happen
      • 2018-06-15 16633, 2018

      • zas
        :)
      • 2018-06-15 16635, 2018

      • samj1912
        lol
      • 2018-06-15 16622, 2018

      • yvanzo
        great!
      • 2018-06-15 16643, 2018

      • samj1912
        zas: retried with a bigger collection
      • 2018-06-15 16649, 2018

      • samj1912
        zas: how was it?
      • 2018-06-15 16600, 2018

      • zas
        good, try again
      • 2018-06-15 16608, 2018

      • zas
        i changed few things
      • 2018-06-15 16645, 2018

      • zas
        solr2 and 3 are set as backups, so if solr1 is down, they'll be used for POST
      • 2018-06-15 16608, 2018

      • samj1912
        okay
      • 2018-06-15 16624, 2018

      • samj1912
        reposted
      • 2018-06-15 16658, 2018

      • zas
        ok it still works, now stop solr1 and retry
      • 2018-06-15 16603, 2018

      • samj1912
        okay
      • 2018-06-15 16631, 2018

      • samj1912
        solr1 stopped
      • 2018-06-15 16657, 2018

      • samj1912
        zas: how was it reposted?
      • 2018-06-15 16604, 2018

      • zas
        yes, on solr2
      • 2018-06-15 16622, 2018

      • zas
        so it works as expected, you can restart solr1
      • 2018-06-15 16643, 2018

      • samj1912
        I wanna see what happens when I stop it in between
      • 2018-06-15 16624, 2018

      • ruaok
        like two kids in a sandbox trying to break their toys.
      • 2018-06-15 16625, 2018

      • ruaok
        <3
      • 2018-06-15 16652, 2018

      • zas
        better now, when it will be in prod that will not be as fun ;)
      • 2018-06-15 16600, 2018

      • ruaok
        true
      • 2018-06-15 16602, 2018

      • samj1912
        okay ,currently solr2 is leader
      • 2018-06-15 16637, 2018

      • samj1912
        reposting URL (4.5 million large), lets see how it handles url changing
      • 2018-06-15 16613, 2018

      • samj1912
        zas: rather can you power solr2 down when I ask?
      • 2018-06-15 16634, 2018

      • zas
        sure
      • 2018-06-15 16637, 2018

      • samj1912
        I think that stopping it manually will let it play some replication packets and clear the index before shutting down
      • 2018-06-15 16644, 2018

      • samj1912
        okay
      • 2018-06-15 16609, 2018

      • zas
        tell me when
      • 2018-06-15 16623, 2018

      • samj1912
        okay zas, as soon as you see reqs to solr-2
      • 2018-06-15 16625, 2018

      • samj1912
        stop it
      • 2018-06-15 16650, 2018

      • samj1912
        now
      • 2018-06-15 16653, 2018

      • samj1912
        zas:
      • 2018-06-15 16656, 2018

      • zas
        they all go to solr1 due to config
      • 2018-06-15 16607, 2018

      • samj1912
        but lets see if it works
      • 2018-06-15 16614, 2018

      • samj1912
        and if they recover properly
      • 2018-06-15 16628, 2018

      • zas
        solr2 stopped
      • 2018-06-15 16634, 2018

      • samj1912
        oh, they are not going to solr2?
      • 2018-06-15 16632, 2018

      • zas
        nope, due to config, solr2/3 are set as backups of solr1 for now
      • 2018-06-15 16643, 2018

      • samj1912
        ah
      • 2018-06-15 16658, 2018

      • samj1912
        wait 1 second then
      • 2018-06-15 16637, 2018

      • samj1912
        I stopped solr3 as well
      • 2018-06-15 16640, 2018

      • samj1912
        lets see what happens
      • 2018-06-15 16647, 2018

      • zas
        solr1 returns 503
      • 2018-06-15 16612, 2018

      • zas
      • 2018-06-15 16617, 2018

      • Slurpee joined the channel
      • 2018-06-15 16638, 2018

      • samj1912
        thats coz solr1 forwards it to solr3 which was the current leader
      • 2018-06-15 16655, 2018

      • zas
        so it didn't take the leadership
      • 2018-06-15 16603, 2018

      • zas
        we need at least 2 servers up
      • 2018-06-15 16616, 2018

      • zas
        (as said, over and over;)
      • 2018-06-15 16629, 2018

      • zas
        starting solr2
      • 2018-06-15 16603, 2018

      • samj1912
        solr3 is back
      • 2018-06-15 16608, 2018

      • samj1912
        solr1 has leadership
      • 2018-06-15 16631, 2018

      • samj1912
        solr2 returning
      • 2018-06-15 16640, 2018

      • zas
        yes, all 3 are up
      • 2018-06-15 16641, 2018

      • samj1912
        now let me retry with solr1 now that it has leadership
      • 2018-06-15 16647, 2018

      • zas
        ok
      • 2018-06-15 16649, 2018

      • samj1912
        solr2 is recovering
      • 2018-06-15 16602, 2018

      • samj1912
        url still recovering
      • 2018-06-15 16625, 2018

      • zas
        i see no query atm
      • 2018-06-15 16635, 2018

      • samj1912
        for solr2 . no
      • 2018-06-15 16640, 2018

      • samj1912
        others will reply
      • 2018-06-15 16645, 2018

      • zas
        i mean for all
      • 2018-06-15 16653, 2018

      • samj1912
        oh no, others reply
      • 2018-06-15 16605, 2018

      • zas
        they do??
      • 2018-06-15 16607, 2018

      • samj1912
        if solr2 is recovering, in case a query comes, it sends it to other nodes
      • 2018-06-15 16610, 2018

      • samj1912
        yeah
      • 2018-06-15 16616, 2018

      • samj1912
        solr1 and solr3 are up
      • 2018-06-15 16630, 2018

      • zas
        but no query on lb
      • 2018-06-15 16639, 2018

      • samj1912
        because no one is querying? :P
      • 2018-06-15 16643, 2018

      • zas
        :)
      • 2018-06-15 16654, 2018

      • yvanzo
        Leo__Verto: I blanked email domains appearing less than 10 times.
      • 2018-06-15 16606, 2018

      • samj1912
        okay recovered
      • 2018-06-15 16611, 2018

      • samj1912
        now let me play it to solr1
      • 2018-06-15 16612, 2018

      • Leo__Verto
        ah yeah, that works too
      • 2018-06-15 16614, 2018

      • samj1912
        and bring solr1 down
      • 2018-06-15 16620, 2018

      • samj1912
        zas: shutdown solr1 please
      • 2018-06-15 16623, 2018

      • samj1912
        when I say
      • 2018-06-15 16640, 2018

      • zas
        samj1912: sure
      • 2018-06-15 16618, 2018

      • samj1912
        zas: now
      • 2018-06-15 16619, 2018

      • zas
        samj1912: i have an idea for that, we coud write a special health check script
      • 2018-06-15 16634, 2018

      • zas
        done, solr1 going down
      • 2018-06-15 16610, 2018

      • samj1912
        zas: did reqs go to any other node?
      • 2018-06-15 16613, 2018

      • samj1912
        or all end up on solr1?
      • 2018-06-15 16619, 2018

      • zas
        basically if we can get the current leader, we could just mark it as healthy, and others as unhealthy
      • 2018-06-15 16629, 2018

      • zas
        solr2
      • 2018-06-15 16615, 2018

      • samj1912
        okay, so partly to solr1 and partly to solr2?
      • 2018-06-15 16620, 2018

      • zas
        yup
      • 2018-06-15 16625, 2018

      • samj1912
        cool
      • 2018-06-15 16638, 2018

      • samj1912
        solr3 was the leader after solr1 went down
      • 2018-06-15 16642, 2018

      • samj1912
        lets bring solr1 back up
      • 2018-06-15 16648, 2018

      • zas
      • 2018-06-15 16607, 2018

      • zas
        there were few 504s
      • 2018-06-15 16608, 2018

      • samj1912
        any 5xx?
      • 2018-06-15 16612, 2018

      • zas
        4
      • 2018-06-15 16613, 2018

      • samj1912
        okay
      • 2018-06-15 16618, 2018

      • samj1912
        hmm
      • 2018-06-15 16638, 2018

      • zas
        expected since that's not instant
      • 2018-06-15 16653, 2018

      • samj1912
        I will add some retries to pysolr
      • 2018-06-15 16658, 2018

      • samj1912
        lets tackle security next
      • 2018-06-15 16608, 2018

      • zas
        samj1912: i think we should just round robin on each node for POST, and don't bother with leader thing
      • 2018-06-15 16623, 2018

      • ruaok
        sounds sane.
      • 2018-06-15 16626, 2018

      • zas
        after all, it's solr cloud matter