0:38 AM
Nyanko-sensei has quit
2021-02-01 03239, 2021
0:39 AM
Nyanko-sensei joined the channel
2021-02-01 03209, 2021
2:37 AM
yokel has quit
2021-02-01 03240, 2021
2:44 AM
yokel joined the channel
2021-02-01 03226, 2021
4:17 AM
Nyanko-sensei has quit
2021-02-01 03212, 2021
4:28 AM
Nyanko-sensei joined the channel
2021-02-01 03250, 2021
4:40 AM
Nyanko-sensei has quit
2021-02-01 03235, 2021
4:45 AM
Nyanko-sensei joined the channel
2021-02-01 03202, 2021
5:03 AM
Nyanko-sensei has quit
2021-02-01 03235, 2021
5:13 AM
Nyanko-sensei joined the channel
2021-02-01 03202, 2021
5:57 AM
AmandeeKumar joined the channel
2021-02-01 03206, 2021
6:01 AM
AmandeeKumar is now known as AmandeepKumar
2021-02-01 03207, 2021
6:09 AM
Nyanko-sensei has quit
2021-02-01 03208, 2021
6:17 AM
Nyanko-sensei joined the channel
2021-02-01 03232, 2021
6:36 AM
AmandeepKumar has quit
2021-02-01 03243, 2021
6:44 AM
_lucifer has quit
2021-02-01 03256, 2021
6:44 AM
_lucifer joined the channel
2021-02-01 03218, 2021
6:45 AM
revi has quit
2021-02-01 03210, 2021
6:47 AM
D4RK-PH0_ has quit
2021-02-01 03200, 2021
6:48 AM
revi joined the channel
2021-02-01 03232, 2021
7:20 AM
AmandeeKumar joined the channel
2021-02-01 03253, 2021
7:27 AM
AmandeeKumar has quit
2021-02-01 03240, 2021
7:29 AM
yvanzo
mo’’in’
2021-02-01 03258, 2021
8:03 AM
sumedh joined the channel
2021-02-01 03214, 2021
8:21 AM
rdswift has quit
2021-02-01 03212, 2021
8:27 AM
rdswift joined the channel
2021-02-01 03247, 2021
8:34 AM
sumedh has quit
2021-02-01 03224, 2021
8:53 AM
Nyanko-sensei has quit
2021-02-01 03224, 2021
8:58 AM
Nyanko-sensei joined the channel
2021-02-01 03238, 2021
9:25 AM
ruaok
mo'in!
2021-02-01 03227, 2021
9:26 AM
ruaok
yvanzo: zas: shall we put some thought behind fixing trille today?
2021-02-01 03257, 2021
9:26 AM
yvanzo
ruaok: is listenbrainaz using rabbitmq too?
2021-02-01 03205, 2021
9:27 AM
ruaok
yes
2021-02-01 03207, 2021
9:29 AM
yvanzo
MB uses it to update search indexes, it should preferably not be stopped or search indexes won’t be up-to-date.
2021-02-01 03247, 2021
9:29 AM
yvanzo
I don’t think that this is what take the most resources on trille, but we could move this queue to PostgreSQL.
2021-02-01 03212, 2021
9:30 AM
ruaok
LB is less sensitive. if a user cannot submit a listen, clients must re-try. so, restarts are ok.
2021-02-01 03228, 2021
9:32 AM
ruaok
2021-02-01 03254, 2021
9:32 AM
ruaok
something is a miss. load on trille keeps growing, traffic in rabbitmq hasn't grown.
2021-02-01 03205, 2021
9:34 AM
yvanzo
IMHO, CB has probably a lot of margin for reducing its footprint on resources.
2021-02-01 03254, 2021
9:34 AM
ruaok
Does CB use RMQ or CB is on trille as well?
2021-02-01 03214, 2021
9:35 AM
yvanzo
it is on trille as well
2021-02-01 03255, 2021
9:35 AM
ruaok
have we ascertained if the resource hog is CB or RMQ?
2021-02-01 03248, 2021
9:37 AM
ruaok
teletgraf is the top process on trille? that feels odd to me.
2021-02-01 03222, 2021
9:41 AM
yvanzo
zas: Is it possible to monitor trille’s containers from grafana more closely, for example using cadvisor?
2021-02-01 03257, 2021
9:41 AM
zas
well, we already have reports for containers on trille
2021-02-01 03258, 2021
9:41 AM
ruaok
you read my mind, we need to a have a % usage per container graph....
2021-02-01 03236, 2021
9:42 AM
zas
2021-02-01 03237, 2021
9:42 AM
ruaok
zas: got link?
2021-02-01 03247, 2021
9:42 AM
ruaok
heh
2021-02-01 03250, 2021
9:42 AM
yvanzo
2021-02-01 03251, 2021
9:42 AM
zas
ignore empty graphs, scroll down
2021-02-01 03227, 2021
9:43 AM
zas
guys, we already have 2 suspects: rabbitmq and critiquebrainz-redis
2021-02-01 03248, 2021
9:43 AM
zas
first one is known to eat cpu for nothing in certain cases
2021-02-01 03203, 2021
9:44 AM
zas
second one is actually having huge write to disk spikes
2021-02-01 03210, 2021
9:44 AM
ruaok
I just dont see rabbitmq as the culprit. but I see redis. those peaks in BlkIo are worrying.
2021-02-01 03218, 2021
9:44 AM
zas
yes^^
2021-02-01 03228, 2021
9:44 AM
zas
it writes far too much data
2021-02-01 03234, 2021
9:45 AM
zas
yesterday I reduced share of trille mbs to almost nothing, so only few queries goes to it, and even with that, we still have very slow queries (read: seconds instead of milliseconds)
2021-02-01 03238, 2021
9:45 AM
ruaok
2021-02-01 03252, 2021
9:45 AM
zas
on some ws queries (usually < 100ms) we can reach > 10s
2021-02-01 03252, 2021
9:45 AM
ruaok
that looks like we need to investigate wtf is happening here.
2021-02-01 03258, 2021
9:46 AM
ruaok
what did we do on 10/9, for instance?
2021-02-01 03204, 2021
9:48 AM
yvanzo
2021-02-01 03217, 2021
9:49 AM
ruaok
yvanzo: thank you. that helps.
2021-02-01 03227, 2021
9:49 AM
ruaok
so, rabbitmq is not the problem. agreed?
2021-02-01 03232, 2021
9:50 AM
yvanzo
+1
2021-02-01 03208, 2021
9:51 AM
ruaok
2021-02-01 03223, 2021
9:51 AM
ruaok
that coincides with a CB release.
2021-02-01 03236, 2021
9:51 AM
ruaok
and presumably a CB container restart.
2021-02-01 03242, 2021
9:52 AM
ruaok
though the release on 10.26 didn't cause the same drop. perhaps redis was not restarted then?
2021-02-01 03258, 2021
9:52 AM
ruaok
zas,yvanzo has redis been restarted recently?
2021-02-01 03231, 2021
9:53 AM
yvanzo
last time 2 months ago
2021-02-01 03245, 2021
9:53 AM
zas
this instance of redis doesn't run with --appendonly=yes, like most instances, so it doesn't use aof
2021-02-01 03200, 2021
9:54 AM
ruaok
any objections to restarting it to see what happens to the graph?
2021-02-01 03213, 2021
9:54 AM
zas
but imho beam.smp cannot be excluded yet
2021-02-01 03227, 2021
9:54 AM
ruaok
I could see a situation where CB is keeping a list in redis that keeps growing. and it is written over and over again.
2021-02-01 03236, 2021
9:54 AM
ruaok
a bug, for sure.
2021-02-01 03257, 2021
9:54 AM
ruaok
zas: I didn't. I'm trying to exclude on clear trouble maker to get another data point.
2021-02-01 03210, 2021
9:55 AM
ruaok
*one
2021-02-01 03218, 2021
9:55 AM
ruaok
_lucifer: ping
2021-02-01 03207, 2021
9:56 AM
zas
2273 process (beam.smp) is writing a lot
2021-02-01 03238, 2021
9:56 AM
ruaok
can you tell where the data goes, zas?
2021-02-01 03252, 2021
9:56 AM
zas
wait
2021-02-01 03208, 2021
9:57 AM
zas
I catched redis write ops
2021-02-01 03215, 2021
9:57 AM
ruaok
beam is deffo the highest disk user. didn't CB add more telegraf logging? could it be overdoing it?
2021-02-01 03217, 2021
9:57 AM
zas
it goes up to 50mb/s
2021-02-01 03234, 2021
9:57 AM
ruaok
yes, that is why I am focusing on redis.
2021-02-01 03234, 2021
9:57 AM
zas
while beam.smp doesn't go over 400kb/s
2021-02-01 03247, 2021
9:57 AM
ruaok
beam.smp is an issue too, but its less spikey.
2021-02-01 03203, 2021
9:58 AM
zas
yes, so unlikely to cause huge delays we see
2021-02-01 03205, 2021
9:58 AM
ruaok
I'm going to restart redis, ok?
2021-02-01 03215, 2021
9:58 AM
yvanzo
It seems due to be CB usage of redis.
2021-02-01 03238, 2021
9:58 AM
zas
not sure restarting it will help, but you can try
2021-02-01 03208, 2021
9:59 AM
ruaok
it might drop the traffic back to 0 and then start growing again. but that would clearly indicate CB is doing something bad.
2021-02-01 03209, 2021
10:00 AM
yvanzo
we will probably the same graph as from September.
2021-02-01 03222, 2021
10:00 AM
_lucifer
ruaok: pong
2021-02-01 03236, 2021
10:00 AM
yvanzo
At least, it should be lower CPU/Mem usage first.
2021-02-01 03252, 2021
10:00 AM
ruaok
peaks are 5-6 mins apart.
2021-02-01 03254, 2021
10:00 AM
ruaok
hi _lucifer !
2021-02-01 03209, 2021
10:01 AM
ruaok
can you please follow the scroll back for the last 20 minutes?
2021-02-01 03215, 2021
10:01 AM
_lucifer
sure
2021-02-01 03223, 2021
10:01 AM
ruaok
we're seeing strange redis use coming from CB.
2021-02-01 03246, 2021
10:01 AM
ruaok
I'm curious if redis use in CB has recently changed. is there anything that gets processed every 5 minutes or so?
2021-02-01 03205, 2021
10:02 AM
ruaok
2021-02-01 03247, 2021
10:03 AM
ruaok
zas, yvanzo : as expected the disk io for redis has dropped to nothing.
2021-02-01 03229, 2021
10:04 AM
ruaok
beam.smp too. which might suggest that whatever the redis bug is it might be logging info to telegraf.
2021-02-01 03201, 2021
10:05 AM
Gazooo7949440 has quit
2021-02-01 03235, 2021
10:05 AM
_lucifer
there were a couple of bug fixes regarding that, a key mismatch but nothing comes to mind that happens at a regular interval
2021-02-01 03202, 2021
10:06 AM
ruaok
ok, the regular interval might be a redis behaviour due to increased use.
2021-02-01 03211, 2021
10:06 AM
_lucifer
all data served by CB is cached
2021-02-01 03247, 2021
10:06 AM
Gazooo7949440 joined the channel
2021-02-01 03248, 2021
10:06 AM
ruaok
what is weird is that we are seeing a lot of data being written to redis, but with very little read. that is fully upside-down of what it should be.
2021-02-01 03208, 2021
10:07 AM
_lucifer
yeah right
2021-02-01 03237, 2021
10:07 AM
_lucifer
can we like a sample of the latest read/writes?
2021-02-01 03237, 2021
10:07 AM
ruaok
so, the primary traffic for CB comes from MB hitting its API.
2021-02-01 03206, 2021
10:08 AM
ruaok
/ws/1/review/?limit=1&offset=0&release_group=ee9b6cad-ee58-3529-81ba-cc204769459c&sort=rating
2021-02-01 03235, 2021
10:08 AM
_lucifer
*view a sample
2021-02-01 03257, 2021
10:08 AM
ruaok
is the endpoint that gets all the traffic. can you please review the entire code chain of this endpoint and review the redis use in great detail to see if we can find something where redis might be used incorrectly?
2021-02-01 03210, 2021
10:09 AM
_lucifer
sure, i'll do that
2021-02-01 03231, 2021
10:09 AM
ruaok
_lucifer: that is what I am hoping to do next. what is being written and read. let me dig
2021-02-01 03249, 2021
10:10 AM
ruaok
wow.
2021-02-01 03206, 2021
10:11 AM
ruaok
an incredible number of new keys are being generated in redis.
2021-02-01 03250, 2021
10:11 AM
_lucifer
my preliminary guess is that if there is no bug, then different MB entities are being queried (different release groups being viewed but the same page is not viewed frequently) but they are not viewed that often. hence, the writes are frequent but the reads are not.
2021-02-01 03216, 2021
10:12 AM
ruaok
6 keys a second are being created. that would be the problem.
2021-02-01 03213, 2021
10:15 AM
ruaok
there must be a problem with the page cache.
2021-02-01 03202, 2021
10:16 AM
ruaok
worst case, each page fetched from MB would cause 1 fetch and 1 write to redis.
2021-02-01 03220, 2021
10:16 AM
ruaok
but I would expect some cache hits, so there should be fewer writes than reads.
2021-02-01 03242, 2021
10:16 AM
ruaok
do you know where the cache keys are generated in CB, _lucifer?
2021-02-01 03254, 2021
10:16 AM
_lucifer
yes a sec
2021-02-01 03215, 2021
10:18 AM
_lucifer
2021-02-01 03236, 2021
10:19 AM
ruaok
was caching in brainzutils changed recently?
2021-02-01 03243, 2021
10:20 AM
_lucifer
no doesn't seem so, the last commit to brainzutils cache was 2 years agp
2021-02-01 03250, 2021
10:20 AM
_lucifer
2021-02-01 03256, 2021
10:20 AM
ruaok
_lucifer: could you do me a quick favor? can you disable caching from that function and make a small PR?
2021-02-01 03218, 2021
10:21 AM
ruaok
then we can deploy that and observe.
2021-02-01 03223, 2021
10:21 AM
_lucifer
sure, on it
2021-02-01 03236, 2021
10:21 AM
ruaok
because right now caching is creating load problems , not solving them.
2021-02-01 03239, 2021
10:21 AM
ruaok
thx
2021-02-01 03216, 2021
10:36 AM
BrainzGit
2021-02-01 03237, 2021
10:36 AM
ruaok
thx
2021-02-01 03247, 2021
10:42 AM
_lucifer
1 test is failing but that is expected.
2021-02-01 03240, 2021
10:43 AM
ruaok
agreed.
2021-02-01 03247, 2021
10:43 AM
ruaok
let me see about deploying.
2021-02-01 03211, 2021
10:44 AM
BrainzGit
2021-02-01 03242, 2021
10:54 AM
_lucifer
I think MB only requires the review text and review ratings but CB provides additional entity data which MB already has. The number of reviews is much less than the number of entities. So for most entities, we are just caching the entity data which MB is not useful to MB. I can add a MB mode to cache only the review text and ratings. I think that would reduce the cache writes to a large extent.
2021-02-01 03222, 2021
10:55 AM
yvanzo
+1
2021-02-01 03230, 2021
10:55 AM
ruaok
oh, that is interesting. maybe make a separate endpoint for that?
2021-02-01 03253, 2021
10:56 AM
alastairp
hello. reading backlog
2021-02-01 03258, 2021
10:56 AM
alastairp
need help with CB release?
2021-02-01 03217, 2021
10:57 AM
ruaok
hopefully not. 🤞