-
D4RK-PH0ENiX has quit
-
BrainzGit
-
-
D4RK-PH0ENiX joined the channel
-
shivam-kapila has quit
-
sampsyo has quit
-
supersandro20005 has quit
-
sampsyo joined the channel
-
sbvkrishna joined the channel
-
Gore has quit
-
Gore joined the channel
-
amCap1712
great Freso
-
rahul24 joined the channel
-
shivam-kapila joined the channel
-
darkstardev13 joined the channel
-
raven__ has quit
-
sbvkrishna has quit
-
rahul24 has quit
-
rahul24 joined the channel
-
Darkloke joined the channel
-
rahul24 has left the channel
-
shivam-kapila has quit
-
yvanzo
mo’’in’
-
updating beta.mb.o
-
VxJasonxV
-
-
outsidecontext
VxJasonxV: oh well, we need to update the homepage, the file name changed slightly. thanks for reporting. please use the links on the download page
-
VxJasonxV
done!
-
outsidecontext
-
yvanzo
updating main mb.o
-
CatQuest has quit
-
done
-
ruaok
-
I'm taking a closer look at why the queue is getting backed up.
-
Gazooo has quit
-
-
iliekcomputers: zas: this is super interesting.
-
cb delta is the amount of time that passes between callbacks from rabbitmq.
-
Gazooo joined the channel
-
300-400ms. our code uses 5ms of that. the system is sitting idle for a good long chunk of time, doing fuck all.
-
zas: I could use your help, please.
-
our instance of rabbitmq was started *20* months ago.
-
which on its own is amazing, given hetzner fans. lol.
-
I dont know what the procedure is for restarting rabbitmq, but I think we may want to try that.
-
trille is neither IO bound nor CPU bound.
-
yvanzo
upgrading mb sitemaps, json-dump, cron
-
iliekcomputers
ruaok: what do you mean by callbacks specifically here?
-
The time it takes pika to call our handler function?
-
MusicbrainzB0T joined the channel
-
MusicbrainzB0T1 has quit
-
ruaok
the time between calls to our callback from pika/rabbitmq
-
in theory it should be near instant, maybe 1ms. but it is ~300ms.
-
maybe a bug in pika?
-
ruaok upgrades to pika 1.1.0
-
shivam-kapila joined the channel
-
yvanzo
-
ruaok
\O/
-
reosarevok
yvanzo: edited the blog post, does it look good?
-
ruaok
zas: plz read the posts from the last few hours.
-
zas
Ok
-
ruaok
having strange latency issues with rabbitmq
-
zas
You mean network-related latency?
-
ruaok
not sure what kind of latency, but I dont think it is network related.
-
the way we read data from rabbitmq is via a callback.
-
and the callback sits idle for up to 300ms between calls -- that is the reason why the queue is backed up so much.
-
we're just not getting called to process data faster.
-
I noticed that rabbitmq has been running for *20* months.
-
do you know how/if we can safely restart rabbitmq?
-
zas
Unsure
-
ruaok
well, help me find out, please.
-
zas
Let me check what is using it
-
BTW rebooting the whole server would be a good thing too
-
ruaok
yes, for sure.
-
CatQuest joined the channel
-
CatQuest has quit
-
CatQuest joined the channel
-
zas
I think it is ok to restart it (rabbitmq), at worse apps using it will reveal how they suck by not handling this case
-
other docker containers should also be ok
-
yvanzo: can you check what's running on trille?
-
ruaok
zas: agreed.
-
zas
I suggest to do a restart of rabbitmq alone, to start with
-
ruaok
ok, wanna do the honors?
-
iliekcomputers
>at worse apps using it will reveal how they suck by not handling this case
-
pretty bad worst case
-
ruaok
its only the CAA using it, no?
-
iliekcomputers
does sir use it's own instance?
-
ruaok
oh, right. that.
-
totally forot about that little detail.
-
zas
we can stop sir & caa
-
iliekcomputers
i'm pretty sure we have good handling in LB.
-
ruaok
iliekcomputers: yeah, we're fine.
-
zas: lets do that.
-
no need to stop LB. we have loads of retry and resiliency built in.
-
iliekcomputers
famous last words
-
(kidding)
-
zas
ok, rabbitmq restart (caa & sir stopped)
-
let's see if it has any impact on lb
-
ruaok
that was it!!
-
2020-02-18 11:43:53,974 INFO cb delta: 0.002 parse: 0.000 write: 0.002 stats: 0.000
-
2ms between calls now!
-
woot!
-
iliekcomputers
🎉
-
ruaok
times are wildly fluctuating
-
back to 400ms now.
-
zas
hmmm
-
I'd reboot the whole server (to upgrade kernel)
-
ruaok
ok, good plan.
-
iliekcomputers
is this trille?
-
ruaok
y
-
iliekcomputers
CB will have downtime then (if we wanna tweet)
-
zas
yup, CB on it too
-
only one instance?
-
iliekcomputers
i think so, yes.
-
zas
ruaok: tweet about it please
-
ruaok
ok.
-
zas
-
but that's the same since 1 year
-
ruaok
tweeted.
-
zas
ok, proceeding, time to pray ;)
-
ruaok fetches pasta
-
I hate having no ping to a server... make me anxious....
-
ruaok
you n' me both.
-
zas
trille's back :)
-
yvanzo is back too
-
ruaok
still waiting for rmq to come back
-
zas
rabbitmq failed to restart
-
not sure why yet
-
-
ok started
-
ruaok
verfiied.
-
fast now.
-
annnnd slow again.
-
zas
can it be a issue with the data volume?
-
ruaok
not sure.
-
can we tell if sir has been moving slow as well?
-
oh!
-
yvanzo
zas: checking
-
ruaok
our writes are slow now.
-
zas
btw, we still run rmq 3.6.5
-
-
ruaok
ok, seems that the rabbitmq slowdown was fixed -- now the problem has moved to influxdb. :)
-
ok, make a ticket for that, zas.
-
yvanzo
should I start sir to see?
-
D4RK-PH0ENiX has quit
-
ruaok
and maybe work with bitmap or yvanzo to upgrade it.
-
yvanzo: fine by me.
-
zas
-
BrainzBot
MBH-534: Upgrade rabbitmq to 3.8.x on trille
-
ruaok
yvanzo: bitmap: can one of you two please work on that?
-
yvanzo
sir is back, not much to mention, its rate is limited to 100 msg in a batch.