in #metabrainz

0:04 AM
ballenby joined the channel
0:56 AM
supersandro2000 has quit
0:56 AM
supersandro2000 joined the channel
1:31 AM
ballenby has quit
1:57 AM
d4rkie has quit
1:58 AM
Nyanko-sensei joined the channel
3:00 AM
MajorLurker joined the channel
3:04 AM
MajorLurker has quit
3:26 AM
dseomn has quit
3:41 AM
dseomn joined the channel
5:20 AM
sumedh joined the channel
5:49 AM
zas

Lb team, check resources sometimes, lemmy 170 load15, Gaga disk full
5:51 AM
shivam-kapila

Morning!
5:56 AM
Lol my similarity to myself is 0.007% 🤣
5:58 AM
Oops _lucifer beat me. His score is 0.001
6:06 AM
_lucifer

😌
7:01 AM
MajorLurker joined the channel
7:05 AM
MajorLurker has quit
8:07 AM
zas

yvanzo: SIR queue is growing since a while
8:15 AM
ballenby joined the channel
8:26 AM
sumedh has quit
8:26 AM
sumedh joined the channel
8:30 AM
navap

I just followed the musicbrainz-docker dev setup steps on ubuntu 18.04 and ended up with the following error when running `sudo docker-compose up -d, any ideas? ERROR: Invalid interpolation format for "volumes" option in service "musicbrainz": "${MUSICBRAINZ_SERVER_LOCAL_ROOT:?Missing path of musicbrainz-server working copy}:/musicbrainz-server"
8:33 AM
MajorLurker joined the channel
8:49 AM
MajorLurker has quit
8:50 AM
v6lur joined the channel
9:03 AM
zas

yvanzo: sir broke at ~3:55 UTC with following error:
9:03 AM
https://www.irccloud.com/pastebin/lsJKlH9v/
9:04 AM
I just restarted the container, it seems ok now (queue decreasing)
9:05 AM
Gazooo has quit
9:08 AM
iliekcomputers: I restarted listenbrainz-web-prod on lemmy -> load drop
9:10 AM
Gazooo joined the channel
9:14 AM
btw, it would be great if LB team had a look at this serious issue at some point.
9:18 AM
Mr_Monkey

navap: Looks like you have to set the `MUSICBRAINZ_SERVER_LOCAL_ROOT` environment variable before starting the server
9:19 AM
(Looking at https://github.com/metabrainz/musicbrainz-docke... and https://github.com/metabrainz/musicbrainz-docke...
9:35 AM
v6lur has quit
9:38 AM
zas

excessive load on lemmy is back: https://stats.metabrainz.org/d/000000048/hetzne...
9:40 AM
input from the internet reaches 75mbps on lemmy
9:40 AM
ruaok: ^^
9:44 AM
ruaok

Dunno what it is. iliekcomputers you around?
9:47 AM
Sorry, I'm not near a computer.
9:47 AM
iliekcomputers

i'm looking
9:48 AM
restarting the cron container once
9:48 AM
i have no idea where the input from the internet is from
10:22 AM
i'll need some help parsing where the load is coming from. it's not from the cron container, i've stopped it. most of htop is a bunch of uwsgi processes, so i assume we're getting a lot of heavy requests. however I have no data on whether we're getting an abnormal number of requests or not, and I'm not sure where to get that data.
11:25 AM
sumedh has quit
11:27 AM
sumedh joined the channel
12:09 PM
sumedh has quit
12:23 PM
Lotheric_ joined the channel
12:24 PM
ballenby has quit
12:24 PM
rdswift_ joined the channel
12:25 PM
nawcom_ joined the channel
12:28 PM
assink_ joined the channel
12:29 PM
bitmap_ joined the channel
12:32 PM
Lotheric has quit
12:32 PM
nawcom has quit
12:32 PM
kgz has quit
12:32 PM
rdswift has quit
12:32 PM
bitmap has quit
12:32 PM
sampsyo has quit
12:32 PM
assink has quit
12:32 PM
rdswift_ is now known as rdswift
12:33 PM
bitmap_ is now known as bitmap
12:40 PM
kgz joined the channel
12:43 PM
sumedh joined the channel
12:43 PM
sampsyo joined the channel
13:01 PM
Lotheric_ is now known as Lotheric
14:01 PM
Sophist-UK joined the channel
14:19 PM
ruaok returns
14:20 PM
shivam-kapila waves
14:21 PM
ruaok

iliekcomputers: https://stats.metabrainz.org/d/000000050/hetzne...
14:21 PM
that is where you can see the network inbound/outbound
14:24 PM
https://stats.metabrainz.org/d/000000050/hetzne...
14:24 PM
that is where we can see the peak.
14:25 PM
if you zoom out you can see a number of peaks like this.
14:26 PM
and anytime you see a peak that has a flat top, it is unlikely to be inbound traffic.
14:26 PM
but something that is bound, by a NIC of a process.
15:07 PM
Lotheric_ joined the channel
15:08 PM
Lotheric has quit
15:08 PM
Lotheric__ joined the channel
15:10 PM
Lotheric__ is now known as Lotheric
15:12 PM
Lotheric_ has quit
16:19 PM
reosarevok

zas, ruaok: 502s
16:19 PM
In prod
16:19 PM
Is that the same lemmy issue?
16:23 PM
zas

hmmm
16:23 PM
nope, something else
16:24 PM
floyd is under heavy load
16:25 PM
bitmap: ^^
16:25 PM
yvanzo: ^^
16:25 PM
sumedh has quit
16:25 PM
_lucifer

today i found a LinkedIn profile where someone had put added 15 artists to Musicbrainz as experience
16:25 PM
sumedh joined the channel
16:27 PM
ruaok

reosarevok: 502 in prod and you bug *us*?? why aren't you investigating?
16:28 PM
reosarevok

Because that suggests sysadmin to me, not junior dev
16:29 PM
Lotheric has a lot of experience suddenly
16:29 PM
Hope it's not the whole "people querying VA releases" thing
16:30 PM
(that code should be released in the next release, but if it seems to be causing issues we could put it out sooner)
16:31 PM
ruaok

reosarevok: you really need to start learning more about our production setup
16:31 PM
we all need to take part in it and its not fair to just push things off to zas.
16:31 PM
so, let's start now.
16:32 PM
1. high disk writes. 40-70MB/s
16:32 PM
2. low disk reads
16:32 PM
3. 60% CPU use
16:33 PM
reosarevok: go google how to find currently running queries in postgres
16:33 PM
reosarevok

I know absolutely 0 things about hardware, so all that tells me very little. I studied web programming and the last time I set up a server it was on a Pentium 2 or so
16:34 PM
Ok, that tells me more :p
16:34 PM
ruaok

this has ZERO to do with hardware.
16:34 PM
this is EVERYTHING to do with the software that YOU help write.
16:35 PM
it is doing something wrong. zas didn't write it. perhaps you didn't either, but you should learn more about what it does.
16:35 PM
reosarevok

Oh, absolutely, my point is that I know nothing about what influences disk writes and reads and CPU use, I mostly know how to make code that makes stuff show up on a website
16:35 PM
zas

it started at 16:11 UTC, massive writes, almost 100% CPU, load ~50, ram ok but usage increased
16:35 PM
reosarevok

I'm not saying I shouldn't learn more
16:35 PM
ruaok

time to start learning.
16:35 PM
reosarevok

Just that this doesn't tell me anything at first :)
16:35 PM
Let's see
16:35 PM
ruaok

load is more normal now
16:35 PM
zas

yes, it decreases
16:37 PM
reosarevok

Guessing that means whatever happened is no longer in pg_stat_activity
16:38 PM
But I see at least one query for VA
16:38 PM
Two
16:38 PM
So yeah
16:38 PM
Prooobably should hotfix that
16:38 PM
Because next release is in almost 10 days
16:38 PM
ruaok

there are 4 queries that have been running for 21 days.
16:39 PM
2 for 21 days. some for longer
16:39 PM
one of them in an explain. wtf.
16:39 PM
reosarevok

That wouldn't cause a sudden spike, I assume, while we know the VA queries do
16:39 PM
But still weird
16:42 PM
bitmap, yvanzo: this has two approvals, so I'm going to hotfix it to beta/prod myself, unless one of you is around and has a good reason not to in the time it takes me to do so :p
16:42 PM
https://github.com/metabrainz/musicbrainz-serve...
16:42 PM
(my comment there can be implemented later
16:42 PM
*=
16:42 PM
** )
16:44 PM
ruaok

this actually makes sense to me -- sundays are heavy load time and if we have something that is known to be bad, this can cause everything to back up.
16:44 PM
so, yes, please hotfix asap.
16:44 PM
zas: I'm also concerned about the fan temp on floyd. do we need to schedule a fix?
16:44 PM
zas

I wonder too, but I think it's normal
16:45 PM
because that's a huge cpu chip, it tends to produce more heat
16:45 PM
also it doesn't throttle at those temperatures
16:46 PM
ruaok

ok. let's keep our eyes on it.
16:46 PM
zas

temp increases with load, but alert threshold is perhaps a bit low for it
16:47 PM
note: it reaches threshold under 100% cpu (all cores) for a loong time
16:48 PM
I'd say the cpu cooling system isn't the best one, but is working "normally"
16:48 PM
(and btw, it's on my radar since a while ;)
16:50 PM
ruaok

heheh, ok.
17:07 PM
shivam-kapila

I am not an expert at all this. But is it possible that some of our client/user has set a cron job to fetch/update data for large data set every sunday
17:08 PM
(Ignore please if talked nonsense)
17:08 PM
Just thinking that why every weekend such high load occurs
17:09 PM
reosarevok

bitmap: btw, any idea about this one? I think it's the one ruaok mentioned above, and it's about sitemaps so you'd be the most likely to know: