#metabrainz

/

12:55 PM
samj1912

oh sorry

2018-06-15 16618, 2018

12:55 PM
samj1912

running from another node

2018-06-15 16621, 2018

12:55 PM
samj1912

wait let me stop that too

2018-06-15 16623, 2018

12:55 PM
zas

we reached 400 ops

2018-06-15 16638, 2018

12:55 PM
samj1912

that was from the gcloud machine

2018-06-15 16648, 2018

12:55 PM
zas

now let's see if solr cloud recovers

2018-06-15 16652, 2018

12:55 PM
ruaok

I'm looking at the that page, but I can't make out which graph, zas.

2018-06-15 16656, 2018

12:55 PM
samj1912

https://www.irccloud.com/pastebin/ofQQFu1V/

2018-06-15 16612, 2018

12:56 PM
samj1912

from the diff. test

2018-06-15 16615, 2018

12:56 PM
samj1912

concurrency was 500

2018-06-15 16618, 2018

12:56 PM
zas

https://stats.metabrainz.org/d/T4MODrIiz/solr-clo…

2018-06-15 16635, 2018

12:56 PM
zas

samj1912: ok, enough, let it cool down

2018-06-15 16652, 2018

12:56 PM
zas

we are above our target of 270 req/s

2018-06-15 16656, 2018

12:56 PM
samj1912

yay :D

2018-06-15 16618, 2018

12:57 PM
samj1912

this should improve with live logs

2018-06-15 16624, 2018

12:57 PM
zas

and if we need more, we can just add a node...

2018-06-15 16624, 2018

12:57 PM
samj1912

more cache hits

2018-06-15 16651, 2018

12:57 PM
samj1912

what was the worse latency during stress?

2018-06-15 16652, 2018

12:57 PM
zas

cx11 for haproxy are perfect, cpu load was still low

2018-06-15 16659, 2018

12:57 PM
zas

and memory is enough

2018-06-15 16616, 2018

12:58 PM
zas

dunno, i'll analyze graphs in a few

2018-06-15 16632, 2018

12:58 PM
zas

we had a weird peak on solr3

2018-06-15 16642, 2018

12:58 PM
samj1912

?

2018-06-15 16659, 2018

12:58 PM
zas

but worse figure is ~500ms

2018-06-15 16606, 2018

12:59 PM
samj1912

nice

2018-06-15 16615, 2018

12:59 PM
samj1912

that too when it was HAMMERED

2018-06-15 16615, 2018

12:59 PM
zas

which is acceptable for that load ;)

2018-06-15 16653, 2018

12:59 PM
zas

anyway, we are good, now we have to secure this stuff

2018-06-15 16602, 2018

13:00 PM
ruaok

phew.

2018-06-15 16608, 2018

13:00 PM
ruaok

sign off from the boss. wooo!

2018-06-15 16612, 2018

13:00 PM
zas

please document everything, i'll update docs with last changes i made to haproxy conf

2018-06-15 16643, 2018

13:00 PM
zas

but first, cofffffeee

2018-06-15 16655, 2018

13:03 PM
yvanzo

Leo__Verto: not yet. Note that even email domains cannot go public, except for email hosting domains.

2018-06-15 16605, 2018

13:05 PM
Leo__Verto

is it okay if I filter those manually or should I find a list and have the script filter by that?

2018-06-15 16619, 2018

13:05 PM
zas

samj1912: can you target sir at solr-cloud.musicbrainz.org ?

2018-06-15 16630, 2018

13:06 PM
zas

i set it up to forward requests to solr1 only for now, i want to see if it works as expected (on POST)

2018-06-15 16653, 2018

13:06 PM
samj1912

zas: instead I will try posting from my local machine

2018-06-15 16605, 2018

13:07 PM
samj1912

and just annotations

2018-06-15 16607, 2018

13:07 PM
zas

whatever ;)

2018-06-15 16602, 2018

13:08 PM
zas

all valid requests should start with /solr/ path right?

2018-06-15 16645, 2018

13:08 PM
yvanzo

zas: your change to test mb json is perfect, the same will do for beta.

2018-06-15 16621, 2018

13:11 PM
samj1912

zas: yes

2018-06-15 16628, 2018

13:12 PM
samj1912

zas: done

2018-06-15 16639, 2018

13:12 PM
samj1912

there should have been 4 requests

2018-06-15 16652, 2018

13:12 PM
zas

it works, all went to solr1

2018-06-15 16608, 2018

13:13 PM
samj1912

zas: what happens when solr1 is down?

2018-06-15 16624, 2018

13:13 PM
ruaok sends a contract to QNAP

2018-06-15 16628, 2018

13:13 PM
ruaok

I never thought that would happen

2018-06-15 16633, 2018

13:13 PM
zas

:)

2018-06-15 16635, 2018

13:13 PM
samj1912

lol

2018-06-15 16622, 2018

13:14 PM
yvanzo

great!

2018-06-15 16643, 2018

13:14 PM
samj1912

zas: retried with a bigger collection

2018-06-15 16649, 2018

13:15 PM
samj1912

zas: how was it?

2018-06-15 16600, 2018

13:18 PM
zas

good, try again

2018-06-15 16608, 2018

13:18 PM
zas

i changed few things

2018-06-15 16645, 2018

13:18 PM
zas

solr2 and 3 are set as backups, so if solr1 is down, they'll be used for POST

2018-06-15 16608, 2018

13:19 PM
samj1912

okay

2018-06-15 16624, 2018

13:19 PM
samj1912

reposted

2018-06-15 16658, 2018

13:19 PM
zas

ok it still works, now stop solr1 and retry

2018-06-15 16603, 2018

13:20 PM
samj1912

okay

2018-06-15 16631, 2018

13:20 PM
samj1912

solr1 stopped

2018-06-15 16657, 2018

13:20 PM
samj1912

zas: how was it reposted?

2018-06-15 16604, 2018

13:21 PM
zas

yes, on solr2

2018-06-15 16622, 2018

13:21 PM
zas

so it works as expected, you can restart solr1

2018-06-15 16643, 2018

13:21 PM
samj1912

I wanna see what happens when I stop it in between

2018-06-15 16624, 2018

13:22 PM
ruaok

like two kids in a sandbox trying to break their toys.

2018-06-15 16625, 2018

13:22 PM
ruaok

<3

2018-06-15 16652, 2018

13:22 PM
zas

better now, when it will be in prod that will not be as fun ;)

2018-06-15 16600, 2018

13:23 PM
ruaok

true

2018-06-15 16602, 2018

13:25 PM
samj1912

okay ,currently solr2 is leader

2018-06-15 16637, 2018

13:25 PM
samj1912

reposting URL (4.5 million large), lets see how it handles url changing

2018-06-15 16613, 2018

13:26 PM
samj1912

zas: rather can you power solr2 down when I ask?

2018-06-15 16634, 2018

13:26 PM
zas

sure

2018-06-15 16637, 2018

13:26 PM
samj1912

I think that stopping it manually will let it play some replication packets and clear the index before shutting down

2018-06-15 16644, 2018

13:26 PM
samj1912

okay

2018-06-15 16609, 2018

13:27 PM
zas

tell me when

2018-06-15 16623, 2018

13:27 PM
samj1912

okay zas, as soon as you see reqs to solr-2

2018-06-15 16625, 2018

13:27 PM
samj1912

stop it

2018-06-15 16650, 2018

13:27 PM
samj1912

now

2018-06-15 16653, 2018

13:27 PM
samj1912

zas:

2018-06-15 16656, 2018

13:27 PM
zas

they all go to solr1 due to config

2018-06-15 16607, 2018

13:28 PM
samj1912

but lets see if it works

2018-06-15 16614, 2018

13:28 PM
samj1912

and if they recover properly

2018-06-15 16628, 2018

13:28 PM
zas

solr2 stopped

2018-06-15 16634, 2018

13:28 PM
samj1912

oh, they are not going to solr2?

2018-06-15 16632, 2018

13:29 PM
zas

nope, due to config, solr2/3 are set as backups of solr1 for now

2018-06-15 16643, 2018

13:29 PM
samj1912

ah

2018-06-15 16658, 2018

13:29 PM
samj1912

wait 1 second then

2018-06-15 16637, 2018

13:30 PM
samj1912

I stopped solr3 as well

2018-06-15 16640, 2018

13:30 PM
samj1912

lets see what happens

2018-06-15 16647, 2018

13:30 PM
zas

solr1 returns 503

2018-06-15 16612, 2018

13:31 PM
zas

https://www.irccloud.com/pastebin/tV8OK8Kt/

2018-06-15 16617, 2018

13:31 PM
Slurpee joined the channel

2018-06-15 16638, 2018

13:31 PM
samj1912

thats coz solr1 forwards it to solr3 which was the current leader

2018-06-15 16655, 2018

13:31 PM
zas

so it didn't take the leadership

2018-06-15 16603, 2018

13:32 PM
zas

we need at least 2 servers up

2018-06-15 16616, 2018

13:32 PM
zas

(as said, over and over;)

2018-06-15 16629, 2018

13:32 PM
zas

starting solr2

2018-06-15 16603, 2018

13:33 PM
samj1912

solr3 is back

2018-06-15 16608, 2018

13:33 PM
samj1912

solr1 has leadership

2018-06-15 16631, 2018

13:33 PM
samj1912

solr2 returning

2018-06-15 16640, 2018

13:33 PM
zas

yes, all 3 are up

2018-06-15 16641, 2018

13:33 PM
samj1912

now let me retry with solr1 now that it has leadership

2018-06-15 16647, 2018

13:33 PM
zas

ok

2018-06-15 16649, 2018

13:33 PM
samj1912

solr2 is recovering

2018-06-15 16602, 2018

13:35 PM
samj1912

url still recovering

2018-06-15 16625, 2018

13:35 PM
zas

i see no query atm

2018-06-15 16635, 2018

13:35 PM
samj1912

for solr2 . no

2018-06-15 16640, 2018

13:35 PM
samj1912

others will reply

2018-06-15 16645, 2018

13:35 PM
zas

i mean for all

2018-06-15 16653, 2018

13:35 PM
samj1912

oh no, others reply

2018-06-15 16605, 2018

13:36 PM
zas

they do??

2018-06-15 16607, 2018

13:36 PM
samj1912

if solr2 is recovering, in case a query comes, it sends it to other nodes

2018-06-15 16610, 2018

13:36 PM
samj1912

yeah

2018-06-15 16616, 2018

13:36 PM
samj1912

solr1 and solr3 are up

2018-06-15 16630, 2018

13:36 PM
zas

but no query on lb

2018-06-15 16639, 2018

13:36 PM
samj1912

because no one is querying? :P

2018-06-15 16643, 2018

13:36 PM
zas

:)

2018-06-15 16654, 2018

13:36 PM
yvanzo

Leo__Verto: I blanked email domains appearing less than 10 times.

2018-06-15 16606, 2018

13:37 PM
samj1912

okay recovered

2018-06-15 16611, 2018

13:37 PM
samj1912

now let me play it to solr1

2018-06-15 16612, 2018

13:37 PM
Leo__Verto

ah yeah, that works too

2018-06-15 16614, 2018

13:37 PM
samj1912

and bring solr1 down

2018-06-15 16620, 2018

13:37 PM
samj1912

zas: shutdown solr1 please

2018-06-15 16623, 2018

13:37 PM
samj1912

when I say

2018-06-15 16640, 2018

13:37 PM
zas

samj1912: sure

2018-06-15 16618, 2018

13:39 PM
samj1912

zas: now

2018-06-15 16619, 2018

13:39 PM
zas

samj1912: i have an idea for that, we coud write a special health check script

2018-06-15 16634, 2018

13:39 PM
zas

done, solr1 going down

2018-06-15 16610, 2018

13:40 PM
samj1912

zas: did reqs go to any other node?

2018-06-15 16613, 2018

13:40 PM
samj1912

or all end up on solr1?

2018-06-15 16619, 2018

13:40 PM
zas

basically if we can get the current leader, we could just mark it as healthy, and others as unhealthy

2018-06-15 16629, 2018

13:40 PM
zas

solr2

2018-06-15 16615, 2018

13:41 PM
samj1912

okay, so partly to solr1 and partly to solr2?

2018-06-15 16620, 2018

13:41 PM
zas

yup

2018-06-15 16625, 2018

13:41 PM
samj1912

cool

2018-06-15 16638, 2018

13:41 PM
samj1912

solr3 was the leader after solr1 went down

2018-06-15 16642, 2018

13:41 PM
samj1912

lets bring solr1 back up

2018-06-15 16648, 2018

13:41 PM
zas

https://www.irccloud.com/pastebin/T3iITQkH/

2018-06-15 16607, 2018

13:42 PM
zas

there were few 504s

2018-06-15 16608, 2018

13:42 PM
samj1912

any 5xx?

2018-06-15 16612, 2018

13:42 PM
zas

4

2018-06-15 16613, 2018

13:42 PM
samj1912

okay

2018-06-15 16618, 2018

13:42 PM
samj1912

hmm

2018-06-15 16638, 2018

13:42 PM
zas

expected since that's not instant

2018-06-15 16653, 2018

13:42 PM
samj1912

I will add some retries to pysolr

2018-06-15 16658, 2018

13:42 PM
samj1912

lets tackle security next

2018-06-15 16608, 2018

13:50 PM
zas

samj1912: i think we should just round robin on each node for POST, and don't bother with leader thing

2018-06-15 16623, 2018

13:50 PM
ruaok

sounds sane.

2018-06-15 16626, 2018

13:50 PM
zas

after all, it's solr cloud matter