-
samj1912
oh sorry
-
running from another node
-
wait let me stop that too
-
zas
we reached 400 ops
-
samj1912
that was from the gcloud machine
-
zas
now let's see if solr cloud recovers
-
ruaok
I'm looking at the that page, but I can't make out which graph, zas.
-
samj1912
-
from the diff. test
-
concurrency was 500
-
zas
-
samj1912: ok, enough, let it cool down
-
we are above our target of 270 req/s
-
samj1912
yay :D
-
this should improve with live logs
-
zas
and if we need more, we can just add a node...
-
samj1912
more cache hits
-
what was the worse latency during stress?
-
zas
cx11 for haproxy are perfect, cpu load was still low
-
and memory is enough
-
dunno, i'll analyze graphs in a few
-
we had a weird peak on solr3
-
samj1912
?
-
zas
but worse figure is ~500ms
-
samj1912
nice
-
that too when it was HAMMERED
-
zas
which is acceptable for that load ;)
-
anyway, we are good, now we have to secure this stuff
-
ruaok
phew.
-
sign off from the boss. wooo!
-
zas
please document everything, i'll update docs with last changes i made to haproxy conf
-
but first, cofffffeee
-
yvanzo
Leo__Verto: not yet. Note that even email domains cannot go public, except for email hosting domains.
-
Leo__Verto
is it okay if I filter those manually or should I find a list and have the script filter by that?
-
zas
-
i set it up to forward requests to solr1 only for now, i want to see if it works as expected (on POST)
-
samj1912
zas: instead I will try posting from my local machine
-
and just annotations
-
zas
whatever ;)
-
all valid requests should start with /solr/ path right?
-
yvanzo
zas: your change to test mb json is perfect, the same will do for beta.
-
samj1912
zas: yes
-
zas: done
-
there should have been 4 requests
-
zas
it works, all went to solr1
-
samj1912
zas: what happens when solr1 is down?
-
ruaok sends a contract to QNAP
-
ruaok
I never thought that would happen
-
zas
:)
-
samj1912
lol
-
yvanzo
great!
-
samj1912
zas: retried with a bigger collection
-
zas: how was it?
-
zas
good, try again
-
i changed few things
-
solr2 and 3 are set as backups, so if solr1 is down, they'll be used for POST
-
samj1912
okay
-
reposted
-
zas
ok it still works, now stop solr1 and retry
-
samj1912
okay
-
solr1 stopped
-
zas: how was it reposted?
-
zas
yes, on solr2
-
so it works as expected, you can restart solr1
-
samj1912
I wanna see what happens when I stop it in between
-
ruaok
like two kids in a sandbox trying to break their toys.
-
<3
-
zas
better now, when it will be in prod that will not be as fun ;)
-
ruaok
true
-
samj1912
okay ,currently solr2 is leader
-
reposting URL (4.5 million large), lets see how it handles url changing
-
zas: rather can you power solr2 down when I ask?
-
zas
sure
-
samj1912
I think that stopping it manually will let it play some replication packets and clear the index before shutting down
-
okay
-
zas
tell me when
-
samj1912
okay zas, as soon as you see reqs to solr-2
-
stop it
-
now
-
zas:
-
zas
they all go to solr1 due to config
-
samj1912
but lets see if it works
-
and if they recover properly
-
zas
solr2 stopped
-
samj1912
oh, they are not going to solr2?
-
zas
nope, due to config, solr2/3 are set as backups of solr1 for now
-
samj1912
ah
-
wait 1 second then
-
I stopped solr3 as well
-
lets see what happens
-
zas
solr1 returns 503
-
-
Slurpee joined the channel
-
samj1912
thats coz solr1 forwards it to solr3 which was the current leader
-
zas
so it didn't take the leadership
-
we need at least 2 servers up
-
(as said, over and over;)
-
starting solr2
-
samj1912
solr3 is back
-
solr1 has leadership
-
solr2 returning
-
zas
yes, all 3 are up
-
samj1912
now let me retry with solr1 now that it has leadership
-
zas
ok
-
samj1912
solr2 is recovering
-
url still recovering
-
zas
i see no query atm
-
samj1912
for solr2 . no
-
others will reply
-
zas
i mean for all
-
samj1912
oh no, others reply
-
zas
they do??
-
samj1912
if solr2 is recovering, in case a query comes, it sends it to other nodes
-
yeah
-
solr1 and solr3 are up
-
zas
but no query on lb
-
samj1912
because no one is querying? :P
-
zas
:)
-
yvanzo
Leo__Verto: I blanked email domains appearing less than 10 times.
-
samj1912
okay recovered
-
now let me play it to solr1
-
Leo__Verto
ah yeah, that works too
-
samj1912
and bring solr1 down
-
zas: shutdown solr1 please
-
when I say
-
zas
samj1912: sure
-
samj1912
zas: now
-
zas
samj1912: i have an idea for that, we coud write a special health check script
-
done, solr1 going down
-
samj1912
zas: did reqs go to any other node?
-
or all end up on solr1?
-
zas
basically if we can get the current leader, we could just mark it as healthy, and others as unhealthy
-
solr2
-
samj1912
okay, so partly to solr1 and partly to solr2?
-
zas
yup
-
samj1912
cool
-
solr3 was the leader after solr1 went down
-
lets bring solr1 back up
-
zas
-
there were few 504s
-
samj1912
any 5xx?
-
zas
4
-
samj1912
okay
-
hmm
-
zas
expected since that's not instant
-
samj1912
I will add some retries to pysolr
-
lets tackle security next
-
zas
samj1912: i think we should just round robin on each node for POST, and don't bother with leader thing
-
ruaok
sounds sane.
-
zas
after all, it's solr cloud matter