lucifer: driving all day today and all the hills and shit are making it hard to load figma... There's a mock-up in there with some of the stats I thought we could share, if you don't mind having a look?
2024-01-04 00413, 2024
aerozol
There's a section that has social media post mockups
You grab em and I'll make them look interesting š
2024-01-04 00423, 2024
aerozol
(Lucy is driving btw everyone, otherwise this would be very problematic haha)
2024-01-04 00403, 2024
minimal has quit
2024-01-04 00444, 2024
bitmap
uhh jimmy out of disk space?
2024-01-04 00435, 2024
bitmap
all of MB is down rn because jimmy can't be accessed
2024-01-04 00451, 2024
bitmap
lucifer: is anything running that would be eating up space? ^
2024-01-04 00454, 2024
bitmap
zas: around?
2024-01-04 00459, 2024
lediur joined the channel
2024-01-04 00453, 2024
lediur has quit
2024-01-04 00441, 2024
bitmap
zas: I had to delete /home/zas/temp.file to even allow postgres to start, it was 2.1GB and I really wasn't sure what else I could remove
2024-01-04 00426, 2024
bitmap
postgres is back but we're in very dangerous territory rn...
2024-01-04 00400, 2024
bitmap
I tried pruning unused docker images but there was nothing
2024-01-04 00419, 2024
aerozol
I've posted on the socials that we're working on the issue, ping me if there's updates to share bitmap
2024-01-04 00432, 2024
bitmap
aerozol: I was able to restart postgres and musicbrainz seems to be back, at least, but things might be unstable if whatever caused the meltdown starts running again
2024-01-04 00408, 2024
aerozol
bitmap: thanks, will update now
2024-01-04 00451, 2024
bitmap
ty
2024-01-04 00442, 2024
bitmap
the listenbrainz DB rose from 32GB to 119GB which I believe is when it ran out of space
Graphana link doesn't work on my phone, but was Discord onto this hours ago š„“
2024-01-04 00431, 2024
aerozol
Oh wait, not hours sorry, I'm still on holiday time sorry. This is about the same time as you posted here
2024-01-04 00456, 2024
bitmap
ah, good, I was worried I missed some alerts
2024-01-04 00410, 2024
bitmap
I was afk but I did see the alerts when I checked my phone
2024-01-04 00426, 2024
bitmap
atj: zas: do we have to increase the size of /srv/postgresql? (or is that not how ZFS works)
2024-01-04 00436, 2024
relaxoMob has quit
2024-01-04 00414, 2024
relaxoMob joined the channel
2024-01-04 00459, 2024
Maxr1998
I only checked Grafana once it was already down and when you guys were already aware of it ^^
2024-01-04 00459, 2024
Maxr1998
Thanks for resolving it so quickly btw!
2024-01-04 00418, 2024
bitmap
I was really worried we'd be down for a long time if I couldn't find anything on jimmy to delete and clear up some space (since PG couldn't even start, so I couldn't clear any tables or anything)
2024-01-04 00431, 2024
bitmap
luckily zas had some random 2GB temp file lying around (hope it wasn't important). maybe we ought to keep more of those in case of emergency lol
2024-01-04 00445, 2024
derwin joined the channel
2024-01-04 00412, 2024
derwin
is the site still broken, or is the endless spinning when I try to add an artist from the release editor on this gigantic release some other issue?
2024-01-04 00449, 2024
bitmap
derwin: there have been quite a few artists/releases added since, so perhaps some other issue (but I'm not sure about adding artists from the release editor specifically)
2024-01-04 00436, 2024
bitmap
if you see anything relevant in the browser console or network panel I can take a look
2024-01-04 00448, 2024
lucifer
bitmap: hi!
2024-01-04 00451, 2024
lucifer
just woke up
2024-01-04 00405, 2024
relaxoMob has quit
2024-01-04 00411, 2024
bitmap
lucifer: hey
2024-01-04 00412, 2024
lucifer
where do you see the LB db size? i just checked and the largest table is 5G
2024-01-04 00426, 2024
bitmap
from the graphana link above
2024-01-04 00416, 2024
lucifer
oh my bad, i had the query wrong. yes i see a 100G table.
2024-01-04 00454, 2024
bitmap
any idea what's wrong, or is it expected to grow that much?
2024-01-04 00433, 2024
lucifer
nope, i can try to remove some of the autogenerated data.
2024-01-04 00451, 2024
bitmap
/srv/postgresql is apparently only 258G but I assume that can be increased if needed
2024-01-04 00410, 2024
lucifer
yeah that seems very weird
2024-01-04 00411, 2024
bitmap
I'm not sure how that is calculated if musicbrainz_db alone is 259GB (is that taking compression into account?)
2024-01-04 00437, 2024
lucifer
or maybe zfs commands need to be used to obtain the free disk space for it.
2024-01-04 00414, 2024
bitmap
yea zpool list shows 68GB free
2024-01-04 00400, 2024
bitmap
much of the currently used space is likely WAL files
2024-01-04 00424, 2024
lucifer
ah okay makes sense
2024-01-04 00456, 2024
bitmap
there is 1.5 TB of WAL files, which is a crazy amount of writes
2024-01-04 00443, 2024
bitmap
trying to get those to drop atm
2024-01-04 00458, 2024
lucifer
are writes still happening?
2024-01-04 00420, 2024
bitmap
I don't think so, since the WAL graph isn't a 45 degree line anymore (lol)
2024-01-04 00434, 2024
lucifer
weird so many writes for 5 hours
2024-01-04 00416, 2024
relaxoMob joined the channel
2024-01-04 00448, 2024
bitmap
I got WAL archiving working again so it should start dropping soon
2024-01-04 00432, 2024
bitmap
but there are close to 100,000, which exceeds anything I've seen before by like 10x
2024-01-04 00445, 2024
lucifer
can we know what the writes were that created those wal files?
2024-01-04 00449, 2024
lucifer
table name maybe?
2024-01-04 00455, 2024
nullhawk joined the channel
2024-01-04 00429, 2024
nullhawk has quit
2024-01-04 00452, 2024
bitmap
pg_stat_all_tables will probably help
2024-01-04 00452, 2024
lucifer
bitmap: i can run vacuum on the big table and try seeing if that reduces the space. probably should, i checked the rows and there is no row big than 1 MB. 24K rows in table. but that would generate more wal i guess so should i wait or do it?
2024-01-04 00410, 2024
bitmap
I'd wait a bit until WAL drops
2024-01-04 00425, 2024
lucifer
makes sense
2024-01-04 00426, 2024
bitmap
in musicbrainz_db, mapping.canonical_release_tmp has the most n_tup_ins by far
2024-01-04 00437, 2024
bitmap
in listenbrainz, it's pg_toast_160991024 (so dunno, I guess oversized columns?)
2024-01-04 00431, 2024
lucifer
we store json in one of the columns so probably that
2024-01-04 00402, 2024
lucifer
the mapping schema i had changed all to unlogged
2024-01-04 00425, 2024
lucifer
so it shouldn't have created any wal
2024-01-04 00429, 2024
bitmap
there is a INSERT INTO statistics.year_in_music statement in the pg logs which has a crazy json document (was holding pg up for like a minute)
2024-01-04 00409, 2024
lucifer
hmm i see.
2024-01-04 00414, 2024
bitmap
(meant I was holding the "page up" key on my keyboard for a minute, not that the query was holding postgres up. :) realized that was phrasing was confusing)
2024-01-04 00412, 2024
lucifer
ah okay
2024-01-04 00409, 2024
derwin
bleh, I guess I will just re-start adding this 50 track release :/
2024-01-04 00448, 2024
bitmap
postgres said the insert took 1428.462 ms so not sure if it was an issue, really
2024-01-04 00419, 2024
bitmap
though I guess there are many of these
2024-01-04 00441, 2024
bitmap
how big is the average "data" column on year_in_music and how many rows are expected?
2024-01-04 00454, 2024
lucifer
i had checked with pg_column_size and the likes, rows are less than 1MB in size which checks out. ~25K rows.
2024-01-04 00434, 2024
lucifer
most people would have less data than 1MB so 20G is my estimate of how large the table should be.
2024-01-04 00424, 2024
lucifer
95 G uhhh, i have a hunch on how that could have happened. the table is jsonb so every update creates a new toast on every data point update.
2024-01-04 00413, 2024
lucifer
there are about 10+ of those, for 8k rows. i guess in the worst case that could somehow balloon into that.
2024-01-04 00445, 2024
lucifer
the WAL space i am not sure, what's the relation between table/row size and the WAL it generates.
2024-01-04 00414, 2024
lucifer
1.5TB seems very excessive, and we have run these queries multiple times last year without any issues.
2024-01-04 00426, 2024
lucifer
so i am still unsure on if this is the actual cause.
2024-01-04 00415, 2024
bitmap
WAL actually started rising at around 18:15 yesterday and it looks like mbid mapper stuff was running at this time, so perhaps a combination of that + YIM? you said the former was moved to unlogged tables, but something is amiss
2024-01-04 00431, 2024
bitmap
since PG is still complaining of too-frequent checkpoints during that time
2024-01-04 00455, 2024
lucifer
hmm i can check the mapping schema to find any logged tables
2024-01-04 00405, 2024
relaxoMob has quit
2024-01-04 00453, 2024
lucifer
just checked all the expected ones are indeed unlogged.
2024-01-04 00438, 2024
bitmap
hrm. well I don't really see anything else in the logs and the tup stats only point to the toast table otherwise. YIM was also running at 18:00 yesterday it seems
umm i see 1200 unprocessed messages in RMQ, possibly a 100 or so are for YIM which failed to insert because of some unrelated error. i have stopped the container so that it doesn't try to insert.
2024-01-04 00441, 2024
bitmap
š
2024-01-04 00406, 2024
lucifer
wal archiving seems to have stopped
2024-01-04 00414, 2024
bitmap
daily json dumps started on 1/3 at 00:00 utc and they are still running