-
samj1912
oh sorry
2018-06-15 16618, 2018
-
samj1912
running from another node
2018-06-15 16621, 2018
-
samj1912
wait let me stop that too
2018-06-15 16623, 2018
-
zas
we reached 400 ops
2018-06-15 16638, 2018
-
samj1912
that was from the gcloud machine
2018-06-15 16648, 2018
-
zas
now let's see if solr cloud recovers
2018-06-15 16652, 2018
-
ruaok
I'm looking at the that page, but I can't make out which graph, zas.
2018-06-15 16656, 2018
-
samj1912
2018-06-15 16612, 2018
-
samj1912
from the diff. test
2018-06-15 16615, 2018
-
samj1912
concurrency was 500
2018-06-15 16618, 2018
-
zas
2018-06-15 16635, 2018
-
zas
samj1912: ok, enough, let it cool down
2018-06-15 16652, 2018
-
zas
we are above our target of 270 req/s
2018-06-15 16656, 2018
-
samj1912
yay :D
2018-06-15 16618, 2018
-
samj1912
this should improve with live logs
2018-06-15 16624, 2018
-
zas
and if we need more, we can just add a node...
2018-06-15 16624, 2018
-
samj1912
more cache hits
2018-06-15 16651, 2018
-
samj1912
what was the worse latency during stress?
2018-06-15 16652, 2018
-
zas
cx11 for haproxy are perfect, cpu load was still low
2018-06-15 16659, 2018
-
zas
and memory is enough
2018-06-15 16616, 2018
-
zas
dunno, i'll analyze graphs in a few
2018-06-15 16632, 2018
-
zas
we had a weird peak on solr3
2018-06-15 16642, 2018
-
samj1912
?
2018-06-15 16659, 2018
-
zas
but worse figure is ~500ms
2018-06-15 16606, 2018
-
samj1912
nice
2018-06-15 16615, 2018
-
samj1912
that too when it was HAMMERED
2018-06-15 16615, 2018
-
zas
which is acceptable for that load ;)
2018-06-15 16653, 2018
-
zas
anyway, we are good, now we have to secure this stuff
2018-06-15 16602, 2018
-
ruaok
phew.
2018-06-15 16608, 2018
-
ruaok
sign off from the boss. wooo!
2018-06-15 16612, 2018
-
zas
please document everything, i'll update docs with last changes i made to haproxy conf
2018-06-15 16643, 2018
-
zas
but first, cofffffeee
2018-06-15 16655, 2018
-
yvanzo
Leo__Verto: not yet. Note that even email domains cannot go public, except for email hosting domains.
2018-06-15 16605, 2018
-
Leo__Verto
is it okay if I filter those manually or should I find a list and have the script filter by that?
2018-06-15 16619, 2018
-
zas
2018-06-15 16630, 2018
-
zas
i set it up to forward requests to solr1 only for now, i want to see if it works as expected (on POST)
2018-06-15 16653, 2018
-
samj1912
zas: instead I will try posting from my local machine
2018-06-15 16605, 2018
-
samj1912
and just annotations
2018-06-15 16607, 2018
-
zas
whatever ;)
2018-06-15 16602, 2018
-
zas
all valid requests should start with /solr/ path right?
2018-06-15 16645, 2018
-
yvanzo
zas: your change to test mb json is perfect, the same will do for beta.
2018-06-15 16621, 2018
-
samj1912
zas: yes
2018-06-15 16628, 2018
-
samj1912
zas: done
2018-06-15 16639, 2018
-
samj1912
there should have been 4 requests
2018-06-15 16652, 2018
-
zas
it works, all went to solr1
2018-06-15 16608, 2018
-
samj1912
zas: what happens when solr1 is down?
2018-06-15 16624, 2018
-
ruaok sends a contract to QNAP
2018-06-15 16628, 2018
-
ruaok
I never thought that would happen
2018-06-15 16633, 2018
-
zas
:)
2018-06-15 16635, 2018
-
samj1912
lol
2018-06-15 16622, 2018
-
yvanzo
great!
2018-06-15 16643, 2018
-
samj1912
zas: retried with a bigger collection
2018-06-15 16649, 2018
-
samj1912
zas: how was it?
2018-06-15 16600, 2018
-
zas
good, try again
2018-06-15 16608, 2018
-
zas
i changed few things
2018-06-15 16645, 2018
-
zas
solr2 and 3 are set as backups, so if solr1 is down, they'll be used for POST
2018-06-15 16608, 2018
-
samj1912
okay
2018-06-15 16624, 2018
-
samj1912
reposted
2018-06-15 16658, 2018
-
zas
ok it still works, now stop solr1 and retry
2018-06-15 16603, 2018
-
samj1912
okay
2018-06-15 16631, 2018
-
samj1912
solr1 stopped
2018-06-15 16657, 2018
-
samj1912
zas: how was it reposted?
2018-06-15 16604, 2018
-
zas
yes, on solr2
2018-06-15 16622, 2018
-
zas
so it works as expected, you can restart solr1
2018-06-15 16643, 2018
-
samj1912
I wanna see what happens when I stop it in between
2018-06-15 16624, 2018
-
ruaok
like two kids in a sandbox trying to break their toys.
2018-06-15 16625, 2018
-
ruaok
<3
2018-06-15 16652, 2018
-
zas
better now, when it will be in prod that will not be as fun ;)
2018-06-15 16600, 2018
-
ruaok
true
2018-06-15 16602, 2018
-
samj1912
okay ,currently solr2 is leader
2018-06-15 16637, 2018
-
samj1912
reposting URL (4.5 million large), lets see how it handles url changing
2018-06-15 16613, 2018
-
samj1912
zas: rather can you power solr2 down when I ask?
2018-06-15 16634, 2018
-
zas
sure
2018-06-15 16637, 2018
-
samj1912
I think that stopping it manually will let it play some replication packets and clear the index before shutting down
2018-06-15 16644, 2018
-
samj1912
okay
2018-06-15 16609, 2018
-
zas
tell me when
2018-06-15 16623, 2018
-
samj1912
okay zas, as soon as you see reqs to solr-2
2018-06-15 16625, 2018
-
samj1912
stop it
2018-06-15 16650, 2018
-
samj1912
now
2018-06-15 16653, 2018
-
samj1912
zas:
2018-06-15 16656, 2018
-
zas
they all go to solr1 due to config
2018-06-15 16607, 2018
-
samj1912
but lets see if it works
2018-06-15 16614, 2018
-
samj1912
and if they recover properly
2018-06-15 16628, 2018
-
zas
solr2 stopped
2018-06-15 16634, 2018
-
samj1912
oh, they are not going to solr2?
2018-06-15 16632, 2018
-
zas
nope, due to config, solr2/3 are set as backups of solr1 for now
2018-06-15 16643, 2018
-
samj1912
ah
2018-06-15 16658, 2018
-
samj1912
wait 1 second then
2018-06-15 16637, 2018
-
samj1912
I stopped solr3 as well
2018-06-15 16640, 2018
-
samj1912
lets see what happens
2018-06-15 16647, 2018
-
zas
solr1 returns 503
2018-06-15 16612, 2018
-
zas
2018-06-15 16617, 2018
-
Slurpee joined the channel
2018-06-15 16638, 2018
-
samj1912
thats coz solr1 forwards it to solr3 which was the current leader
2018-06-15 16655, 2018
-
zas
so it didn't take the leadership
2018-06-15 16603, 2018
-
zas
we need at least 2 servers up
2018-06-15 16616, 2018
-
zas
(as said, over and over;)
2018-06-15 16629, 2018
-
zas
starting solr2
2018-06-15 16603, 2018
-
samj1912
solr3 is back
2018-06-15 16608, 2018
-
samj1912
solr1 has leadership
2018-06-15 16631, 2018
-
samj1912
solr2 returning
2018-06-15 16640, 2018
-
zas
yes, all 3 are up
2018-06-15 16641, 2018
-
samj1912
now let me retry with solr1 now that it has leadership
2018-06-15 16647, 2018
-
zas
ok
2018-06-15 16649, 2018
-
samj1912
solr2 is recovering
2018-06-15 16602, 2018
-
samj1912
url still recovering
2018-06-15 16625, 2018
-
zas
i see no query atm
2018-06-15 16635, 2018
-
samj1912
for solr2 . no
2018-06-15 16640, 2018
-
samj1912
others will reply
2018-06-15 16645, 2018
-
zas
i mean for all
2018-06-15 16653, 2018
-
samj1912
oh no, others reply
2018-06-15 16605, 2018
-
zas
they do??
2018-06-15 16607, 2018
-
samj1912
if solr2 is recovering, in case a query comes, it sends it to other nodes
2018-06-15 16610, 2018
-
samj1912
yeah
2018-06-15 16616, 2018
-
samj1912
solr1 and solr3 are up
2018-06-15 16630, 2018
-
zas
but no query on lb
2018-06-15 16639, 2018
-
samj1912
because no one is querying? :P
2018-06-15 16643, 2018
-
zas
:)
2018-06-15 16654, 2018
-
yvanzo
Leo__Verto: I blanked email domains appearing less than 10 times.
2018-06-15 16606, 2018
-
samj1912
okay recovered
2018-06-15 16611, 2018
-
samj1912
now let me play it to solr1
2018-06-15 16612, 2018
-
Leo__Verto
ah yeah, that works too
2018-06-15 16614, 2018
-
samj1912
and bring solr1 down
2018-06-15 16620, 2018
-
samj1912
zas: shutdown solr1 please
2018-06-15 16623, 2018
-
samj1912
when I say
2018-06-15 16640, 2018
-
zas
samj1912: sure
2018-06-15 16618, 2018
-
samj1912
zas: now
2018-06-15 16619, 2018
-
zas
samj1912: i have an idea for that, we coud write a special health check script
2018-06-15 16634, 2018
-
zas
done, solr1 going down
2018-06-15 16610, 2018
-
samj1912
zas: did reqs go to any other node?
2018-06-15 16613, 2018
-
samj1912
or all end up on solr1?
2018-06-15 16619, 2018
-
zas
basically if we can get the current leader, we could just mark it as healthy, and others as unhealthy
2018-06-15 16629, 2018
-
zas
solr2
2018-06-15 16615, 2018
-
samj1912
okay, so partly to solr1 and partly to solr2?
2018-06-15 16620, 2018
-
zas
yup
2018-06-15 16625, 2018
-
samj1912
cool
2018-06-15 16638, 2018
-
samj1912
solr3 was the leader after solr1 went down
2018-06-15 16642, 2018
-
samj1912
lets bring solr1 back up
2018-06-15 16648, 2018
-
zas
2018-06-15 16607, 2018
-
zas
there were few 504s
2018-06-15 16608, 2018
-
samj1912
any 5xx?
2018-06-15 16612, 2018
-
zas
4
2018-06-15 16613, 2018
-
samj1912
okay
2018-06-15 16618, 2018
-
samj1912
hmm
2018-06-15 16638, 2018
-
zas
expected since that's not instant
2018-06-15 16653, 2018
-
samj1912
I will add some retries to pysolr
2018-06-15 16658, 2018
-
samj1912
lets tackle security next
2018-06-15 16608, 2018
-
zas
samj1912: i think we should just round robin on each node for POST, and don't bother with leader thing
2018-06-15 16623, 2018
-
ruaok
sounds sane.
2018-06-15 16626, 2018
-
zas
after all, it's solr cloud matter