in #metabrainz

0:59 AM
LupinIII has quit
1:19 AM
minimal has quit
3:37 AM
pite joined the channel
4:04 AM
\- has quit
4:47 AM
pite has quit
4:55 AM
texke` joined the channel
4:55 AM
texke has quit
6:12 AM
Kladky joined the channel
7:54 AM
BrainzGit

[musicbrainz-server] 14reosarevok opened pull request #3388 (03master…flow-249): Update Flow to 0.250.0 https://github.com/metabrainz/musicbrainz-serve...
8:03 AM
\- joined the channel
8:08 AM
zas[m] joined the channel
8:08 AM
zas[m]

yvanzo: atj we have a lot of 504s on search cluster since yesterday -> https://stats.metabrainz.org/goto/wihuixiNg?org...
8:10 AM
We had also a lot of transient zombies processes on rakim, it seems it was related to sir-solr9-prod container, I just restarted containers there to be sure
8:12 AM
Also check https://stats.metabrainz.org/goto/u2u4ZbmNg?org...
8:13 AM
SigHunter has quit
8:14 AM
yvanzo[m]

Hi zas, last link gives me Dashboard not found.
8:14 AM
Access denied?
8:16 AM
SigHunter joined the channel
8:17 AM
zas[m]

are you logged in?
8:17 AM
That's SolrCloud 9 dashboard
8:18 AM
yvanzo[m]

yes
8:20 AM
zas[m]

It seems there is some instability on solr cloud side, number of threads is pretty high on 6 nodes over 8
8:20 AM
and response times were quite slow for a while on certain nodes
8:27 AM
I'll restart solr nodes with very high number of threads one by one, I just did with solr1 and it seems to come back to normal
8:32 AM
yvanzo[m]

What the name of the dashboard?
8:33 AM
zas[m]

SolrCloud 9
8:44 AM
restarting solr nodes seems to work, number of threads goes down from 3k+ to ~650 after the restart (which is the usual number, as before incident). Not sure what happened though.
8:45 AM
I restarted 1,2,3,4 already, first 2 ones took a long time (like 5 minutes), but now they restart much faster. I'm restarting 5 atm, and 7 is next (and last).
8:45 AM
6 & 8 were running as normal
8:48 AM
weird, 1 & 2 took ~5 minutes to restart, 3 & 4 like ~1 minute, and 5 & 7 just few seconds
8:53 AM
All nodes are now on par regarding cpu/mem/threads
8:53 AM
Let's see if we still get those 504s
9:07 AM
yvanzo[m]

resolved so far
9:08 AM
please give me access to the SolrCloud 9 dashboard when you have time
9:08 AM
zas[m]

Yes, everything's back to normal after I restarted last node