0:38 AM 
     
        
        Nyanko-sensei has quit
     
      2021-02-01 03239, 2021 
    
        0:39 AM 
     
        
        Nyanko-sensei joined the channel
     
      2021-02-01 03209, 2021 
    
        2:37 AM 
     
        
        yokel has quit
     
      2021-02-01 03240, 2021 
    
        2:44 AM 
     
        
        yokel joined the channel
     
      2021-02-01 03226, 2021 
    
        4:17 AM 
     
        
        Nyanko-sensei has quit
     
      2021-02-01 03212, 2021 
    
        4:28 AM 
     
        
        Nyanko-sensei joined the channel
     
      2021-02-01 03250, 2021 
    
        4:40 AM 
     
        
        Nyanko-sensei has quit
     
      2021-02-01 03235, 2021 
    
        4:45 AM 
     
        
        Nyanko-sensei joined the channel
     
      2021-02-01 03202, 2021 
    
        5:03 AM 
     
        
        Nyanko-sensei has quit
     
      2021-02-01 03235, 2021 
    
        5:13 AM 
     
        
        Nyanko-sensei joined the channel
     
      2021-02-01 03202, 2021 
    
        5:57 AM 
     
        
        AmandeeKumar joined the channel
     
      2021-02-01 03206, 2021 
    
        6:01 AM 
     
        
        AmandeeKumar is now known as AmandeepKumar
     
      2021-02-01 03207, 2021 
    
        6:09 AM 
     
        
        Nyanko-sensei has quit
     
      2021-02-01 03208, 2021 
    
        6:17 AM 
     
        
        Nyanko-sensei joined the channel
     
      2021-02-01 03232, 2021 
    
        6:36 AM 
     
        
        AmandeepKumar has quit
     
      2021-02-01 03243, 2021 
    
        6:44 AM 
     
        
        _lucifer has quit
     
      2021-02-01 03256, 2021 
    
        6:44 AM 
     
        
        _lucifer joined the channel
     
      2021-02-01 03218, 2021 
    
        6:45 AM 
     
        
        revi has quit
     
      2021-02-01 03210, 2021 
    
        6:47 AM 
     
        
        D4RK-PH0_ has quit
     
      2021-02-01 03200, 2021 
    
        6:48 AM 
     
        
        revi joined the channel
     
      2021-02-01 03232, 2021 
    
        7:20 AM 
     
        
        AmandeeKumar joined the channel
     
      2021-02-01 03253, 2021 
    
        7:27 AM 
     
        
        AmandeeKumar has quit
     
      2021-02-01 03240, 2021 
    
        7:29 AM 
     
                    yvanzo
                mo’’in’
     
      2021-02-01 03258, 2021 
    
        8:03 AM 
     
        
        sumedh joined the channel
     
      2021-02-01 03214, 2021 
    
        8:21 AM 
     
        
        rdswift has quit
     
      2021-02-01 03212, 2021 
    
        8:27 AM 
     
        
        rdswift joined the channel
     
      2021-02-01 03247, 2021 
    
        8:34 AM 
     
        
        sumedh has quit
     
      2021-02-01 03224, 2021 
    
        8:53 AM 
     
        
        Nyanko-sensei has quit
     
      2021-02-01 03224, 2021 
    
        8:58 AM 
     
        
        Nyanko-sensei joined the channel
     
      2021-02-01 03238, 2021 
    
        9:25 AM 
     
                    ruaok
                mo'in!
     
      2021-02-01 03227, 2021 
    
        9:26 AM 
     
                    ruaok
                yvanzo: zas: shall we put some thought behind fixing trille today?
     
      2021-02-01 03257, 2021 
    
        9:26 AM 
     
                    yvanzo
                ruaok: is listenbrainaz using rabbitmq too?
     
      2021-02-01 03205, 2021 
    
        9:27 AM 
     
                    ruaok
                yes
     
      2021-02-01 03207, 2021 
    
        9:29 AM 
     
                    yvanzo
                MB uses it to update search indexes, it should preferably not be stopped or search indexes won’t be up-to-date.
     
      2021-02-01 03247, 2021 
    
        9:29 AM 
     
                    yvanzo
                I don’t think that this is what take the most resources on trille, but we could move this queue to PostgreSQL.
     
      2021-02-01 03212, 2021 
    
        9:30 AM 
     
                    ruaok
                LB is less sensitive. if a user cannot submit a listen, clients must re-try. so, restarts are ok.
     
      2021-02-01 03228, 2021 
    
        9:32 AM 
     
                    ruaok
                
     
      2021-02-01 03254, 2021 
    
        9:32 AM 
     
                    ruaok
                something is a miss. load on trille keeps growing, traffic in rabbitmq hasn't grown.
     
      2021-02-01 03205, 2021 
    
        9:34 AM 
     
                    yvanzo
                IMHO, CB has probably a lot of margin for reducing its footprint on resources.
     
      2021-02-01 03254, 2021 
    
        9:34 AM 
     
                    ruaok
                Does CB use RMQ or CB is on trille as well?
     
      2021-02-01 03214, 2021 
    
        9:35 AM 
     
                    yvanzo
                it is on trille as well
     
      2021-02-01 03255, 2021 
    
        9:35 AM 
     
                    ruaok
                have we ascertained if the resource hog is CB or RMQ?
     
      2021-02-01 03248, 2021 
    
        9:37 AM 
     
                    ruaok
                teletgraf is the top process on trille? that feels odd to me.
     
      2021-02-01 03222, 2021 
    
        9:41 AM 
     
                    yvanzo
                zas: Is it possible to monitor trille’s containers from grafana more closely, for example using cadvisor?
     
      2021-02-01 03257, 2021 
    
        9:41 AM 
     
                    zas
                well, we already have reports for containers on trille
     
      2021-02-01 03258, 2021 
    
        9:41 AM 
     
                    ruaok
                you read my mind, we need to a have a % usage per container graph....
     
      2021-02-01 03236, 2021 
    
        9:42 AM 
     
                    zas
                
     
      2021-02-01 03237, 2021 
    
        9:42 AM 
     
                    ruaok
                zas: got link?
     
      2021-02-01 03247, 2021 
    
        9:42 AM 
     
                    ruaok
                heh
     
      2021-02-01 03250, 2021 
    
        9:42 AM 
     
                    yvanzo
                
     
      2021-02-01 03251, 2021 
    
        9:42 AM 
     
                    zas
                ignore empty graphs, scroll down
     
      2021-02-01 03227, 2021 
    
        9:43 AM 
     
                    zas
                guys, we already have 2 suspects: rabbitmq and critiquebrainz-redis
     
      2021-02-01 03248, 2021 
    
        9:43 AM 
     
                    zas
                first one is known to eat cpu for nothing in certain cases
     
      2021-02-01 03203, 2021 
    
        9:44 AM 
     
                    zas
                second one is actually having huge write to disk spikes
     
      2021-02-01 03210, 2021 
    
        9:44 AM 
     
                    ruaok
                I just dont see rabbitmq as the culprit. but I see redis. those peaks in BlkIo are worrying.
     
      2021-02-01 03218, 2021 
    
        9:44 AM 
     
                    zas
                yes^^
     
      2021-02-01 03228, 2021 
    
        9:44 AM 
     
                    zas
                it writes far too much data
     
      2021-02-01 03234, 2021 
    
        9:45 AM 
     
                    zas
                yesterday I reduced share of trille mbs to almost nothing, so only few queries goes to it, and even with that, we still have very slow queries (read: seconds instead of milliseconds)
     
      2021-02-01 03238, 2021 
    
        9:45 AM 
     
                    ruaok
                
     
      2021-02-01 03252, 2021 
    
        9:45 AM 
     
                    zas
                on some ws queries (usually < 100ms) we can reach > 10s
     
      2021-02-01 03252, 2021 
    
        9:45 AM 
     
                    ruaok
                that looks like we need to investigate wtf is happening here.
     
      2021-02-01 03258, 2021 
    
        9:46 AM 
     
                    ruaok
                what did we do on 10/9, for instance?
     
      2021-02-01 03204, 2021 
    
        9:48 AM 
     
                    yvanzo
                
     
      2021-02-01 03217, 2021 
    
        9:49 AM 
     
                    ruaok
                yvanzo: thank you. that helps.
     
      2021-02-01 03227, 2021 
    
        9:49 AM 
     
                    ruaok
                so, rabbitmq is not the problem. agreed?
     
      2021-02-01 03232, 2021 
    
        9:50 AM 
     
                    yvanzo
                +1
     
      2021-02-01 03208, 2021 
    
        9:51 AM 
     
                    ruaok
                
     
      2021-02-01 03223, 2021 
    
        9:51 AM 
     
                    ruaok
                that coincides with a CB release.
     
      2021-02-01 03236, 2021 
    
        9:51 AM 
     
                    ruaok
                and presumably a CB container restart.
     
      2021-02-01 03242, 2021 
    
        9:52 AM 
     
                    ruaok
                though the release on 10.26 didn't cause the same drop. perhaps redis was not restarted then?
     
      2021-02-01 03258, 2021 
    
        9:52 AM 
     
                    ruaok
                zas,yvanzo has redis been restarted recently?
     
      2021-02-01 03231, 2021 
    
        9:53 AM 
     
                    yvanzo
                last time 2 months ago
     
      2021-02-01 03245, 2021 
    
        9:53 AM 
     
                    zas
                this instance of redis doesn't run with --appendonly=yes, like most instances, so it doesn't use aof
     
      2021-02-01 03200, 2021 
    
        9:54 AM 
     
                    ruaok
                any objections to restarting it to see what happens to the graph?
     
      2021-02-01 03213, 2021 
    
        9:54 AM 
     
                    zas
                but imho beam.smp cannot be excluded yet
     
      2021-02-01 03227, 2021 
    
        9:54 AM 
     
                    ruaok
                I could see a situation where CB is keeping a list in redis that keeps growing. and it is written over and over again.
     
      2021-02-01 03236, 2021 
    
        9:54 AM 
     
                    ruaok
                a bug, for sure.
     
      2021-02-01 03257, 2021 
    
        9:54 AM 
     
                    ruaok
                zas: I didn't. I'm trying to exclude on clear trouble maker to get another data point.
     
      2021-02-01 03210, 2021 
    
        9:55 AM 
     
                    ruaok
                *one
     
      2021-02-01 03218, 2021 
    
        9:55 AM 
     
                    ruaok
                _lucifer: ping
     
      2021-02-01 03207, 2021 
    
        9:56 AM 
     
                    zas
                2273 process (beam.smp) is writing a lot
     
      2021-02-01 03238, 2021 
    
        9:56 AM 
     
                    ruaok
                can you tell where the data goes, zas?
     
      2021-02-01 03252, 2021 
    
        9:56 AM 
     
                    zas
                wait
     
      2021-02-01 03208, 2021 
    
        9:57 AM 
     
                    zas
                I catched redis write ops
     
      2021-02-01 03215, 2021 
    
        9:57 AM 
     
                    ruaok
                beam is deffo the highest disk user. didn't CB add more telegraf logging? could it be overdoing it?
     
      2021-02-01 03217, 2021 
    
        9:57 AM 
     
                    zas
                it goes up to 50mb/s
     
      2021-02-01 03234, 2021 
    
        9:57 AM 
     
                    ruaok
                yes, that is why I am focusing on redis.
     
      2021-02-01 03234, 2021 
    
        9:57 AM 
     
                    zas
                while beam.smp doesn't go over 400kb/s
     
      2021-02-01 03247, 2021 
    
        9:57 AM 
     
                    ruaok
                beam.smp is an issue too, but its less spikey.
     
      2021-02-01 03203, 2021 
    
        9:58 AM 
     
                    zas
                yes, so unlikely to cause huge delays we see
     
      2021-02-01 03205, 2021 
    
        9:58 AM 
     
                    ruaok
                I'm going to restart redis, ok?
     
      2021-02-01 03215, 2021 
    
        9:58 AM 
     
                    yvanzo
                It seems due to be CB usage of redis.
     
      2021-02-01 03238, 2021 
    
        9:58 AM 
     
                    zas
                not sure restarting it will help, but you can try
     
      2021-02-01 03208, 2021 
    
        9:59 AM 
     
                    ruaok
                it might drop the traffic back to 0 and then start growing again. but that would clearly indicate CB is doing something bad.
     
      2021-02-01 03209, 2021 
    
        10:00 AM 
     
                    yvanzo
                we will probably the same graph as from September.
     
      2021-02-01 03222, 2021 
    
        10:00 AM 
     
                    _lucifer
                ruaok: pong
     
      2021-02-01 03236, 2021 
    
        10:00 AM 
     
                    yvanzo
                At least, it should be lower CPU/Mem usage first.
     
      2021-02-01 03252, 2021 
    
        10:00 AM 
     
                    ruaok
                peaks are 5-6 mins apart.
     
      2021-02-01 03254, 2021 
    
        10:00 AM 
     
                    ruaok
                hi _lucifer !
     
      2021-02-01 03209, 2021 
    
        10:01 AM 
     
                    ruaok
                can you please follow the scroll back for the last 20 minutes?
     
      2021-02-01 03215, 2021 
    
        10:01 AM 
     
                    _lucifer
                sure
     
      2021-02-01 03223, 2021 
    
        10:01 AM 
     
                    ruaok
                we're seeing strange redis use coming from CB.
     
      2021-02-01 03246, 2021 
    
        10:01 AM 
     
                    ruaok
                I'm curious if redis use in CB has recently changed. is there anything that gets processed every 5 minutes or so?
     
      2021-02-01 03205, 2021 
    
        10:02 AM 
     
                    ruaok
                
     
      2021-02-01 03247, 2021 
    
        10:03 AM 
     
                    ruaok
                zas, yvanzo : as expected the disk io for redis has dropped to nothing.
     
      2021-02-01 03229, 2021 
    
        10:04 AM 
     
                    ruaok
                beam.smp too. which might suggest that whatever the redis bug is it might be logging info to telegraf.
     
      2021-02-01 03201, 2021 
    
        10:05 AM 
     
        
        Gazooo7949440 has quit
     
      2021-02-01 03235, 2021 
    
        10:05 AM 
     
                    _lucifer
                there were a couple of bug fixes regarding that, a key mismatch but nothing comes to mind that happens at a regular interval
     
      2021-02-01 03202, 2021 
    
        10:06 AM 
     
                    ruaok
                ok, the regular interval might be a redis behaviour due to increased use.
     
      2021-02-01 03211, 2021 
    
        10:06 AM 
     
                    _lucifer
                all data served by CB is cached
     
      2021-02-01 03247, 2021 
    
        10:06 AM 
     
        
        Gazooo7949440 joined the channel
     
      2021-02-01 03248, 2021 
    
        10:06 AM 
     
                    ruaok
                what is weird is that we are seeing a lot of data being written to redis, but with very little read. that is fully upside-down of what it should be.
     
      2021-02-01 03208, 2021 
    
        10:07 AM 
     
                    _lucifer
                yeah right
     
      2021-02-01 03237, 2021 
    
        10:07 AM 
     
                    _lucifer
                can we like a sample of the latest read/writes?
     
      2021-02-01 03237, 2021 
    
        10:07 AM 
     
                    ruaok
                so, the primary traffic for CB comes from MB hitting its API.
     
      2021-02-01 03206, 2021 
    
        10:08 AM 
     
                    ruaok
                /ws/1/review/?limit=1&offset=0&release_group=ee9b6cad-ee58-3529-81ba-cc204769459c&sort=rating
     
      2021-02-01 03235, 2021 
    
        10:08 AM 
     
                    _lucifer
                *view a sample
     
      2021-02-01 03257, 2021 
    
        10:08 AM 
     
                    ruaok
                is the endpoint that gets all the traffic. can you please review the entire code chain of this endpoint and review the redis use in great detail to see if we can find something where redis might be used incorrectly?
     
      2021-02-01 03210, 2021 
    
        10:09 AM 
     
                    _lucifer
                sure, i'll do that
     
      2021-02-01 03231, 2021 
    
        10:09 AM 
     
                    ruaok
                _lucifer: that is what I am hoping to do next. what is being written and read. let me dig
     
      2021-02-01 03249, 2021 
    
        10:10 AM 
     
                    ruaok
                wow.
     
      2021-02-01 03206, 2021 
    
        10:11 AM 
     
                    ruaok
                an incredible number of new keys are being generated in redis.
     
      2021-02-01 03250, 2021 
    
        10:11 AM 
     
                    _lucifer
                my preliminary guess is that if there is no bug, then different MB entities are being queried (different release groups being viewed but the same page is not viewed frequently) but they are not viewed that often. hence, the writes are frequent but the reads are not.
     
      2021-02-01 03216, 2021 
    
        10:12 AM 
     
                    ruaok
                6 keys a second are being created. that would be the problem.
     
      2021-02-01 03213, 2021 
    
        10:15 AM 
     
                    ruaok
                there must be a problem with the page cache.
     
      2021-02-01 03202, 2021 
    
        10:16 AM 
     
                    ruaok
                worst case, each page fetched from MB would cause 1 fetch and 1 write to redis.
     
      2021-02-01 03220, 2021 
    
        10:16 AM 
     
                    ruaok
                but I would expect some cache hits, so there should be fewer writes than reads.
     
      2021-02-01 03242, 2021 
    
        10:16 AM 
     
                    ruaok
                do you know where the cache keys are generated in CB, _lucifer?
     
      2021-02-01 03254, 2021 
    
        10:16 AM 
     
                    _lucifer
                yes a sec
     
      2021-02-01 03215, 2021 
    
        10:18 AM 
     
                    _lucifer
                
     
      2021-02-01 03236, 2021 
    
        10:19 AM 
     
                    ruaok
                was caching in brainzutils changed recently?
     
      2021-02-01 03243, 2021 
    
        10:20 AM 
     
                    _lucifer
                no doesn't seem so, the last commit to brainzutils cache was 2 years agp
     
      2021-02-01 03250, 2021 
    
        10:20 AM 
     
                    _lucifer
                
     
      2021-02-01 03256, 2021 
    
        10:20 AM 
     
                    ruaok
                _lucifer: could you do me a quick favor? can you disable caching from that function and make a small PR?
     
      2021-02-01 03218, 2021 
    
        10:21 AM 
     
                    ruaok
                then we can deploy that and observe.
     
      2021-02-01 03223, 2021 
    
        10:21 AM 
     
                    _lucifer
                sure, on it
     
      2021-02-01 03236, 2021 
    
        10:21 AM 
     
                    ruaok
                because right now caching is creating load problems , not solving them.
     
      2021-02-01 03239, 2021 
    
        10:21 AM 
     
                    ruaok
                thx
     
      2021-02-01 03216, 2021 
    
        10:36 AM 
     
                    BrainzGit
                
     
      2021-02-01 03237, 2021 
    
        10:36 AM 
     
                    ruaok
                thx
     
      2021-02-01 03247, 2021 
    
        10:42 AM 
     
                    _lucifer
                1 test is failing but that is expected.
     
      2021-02-01 03240, 2021 
    
        10:43 AM 
     
                    ruaok
                agreed.
     
      2021-02-01 03247, 2021 
    
        10:43 AM 
     
                    ruaok
                let me see about deploying.
     
      2021-02-01 03211, 2021 
    
        10:44 AM 
     
                    BrainzGit
                
     
      2021-02-01 03242, 2021 
    
        10:54 AM 
     
                    _lucifer
                I think MB only requires the review text and review ratings but CB provides additional entity data which MB already has. The number of reviews is much less than the number of entities. So for most entities, we are just caching the entity data which MB is not useful to MB. I can add a MB mode to cache only the review text and ratings. I think that would reduce the cache writes to a large extent.
     
      2021-02-01 03222, 2021 
    
        10:55 AM 
     
                    yvanzo
                +1
     
      2021-02-01 03230, 2021 
    
        10:55 AM 
     
                    ruaok
                oh, that is interesting. maybe make a separate endpoint for that?
     
      2021-02-01 03253, 2021 
    
        10:56 AM 
     
                    alastairp
                hello. reading backlog
     
      2021-02-01 03258, 2021 
    
        10:56 AM 
     
                    alastairp
                need help with CB release?
     
      2021-02-01 03217, 2021 
    
        10:57 AM 
     
                    ruaok
                hopefully not. 🤞