#metabrainz

/

0:11 AM
Etua joined the channel

2021-03-05 06414, 2021

0:14 AM
Etua has quit

2021-03-05 06428, 2021

0:53 AM
Sophist_UK joined the channel

2021-03-05 06458, 2021

0:53 AM
Sophist-UK has quit

2021-03-05 06425, 2021

1:45 AM
Cyna[m] has quit

2021-03-05 06425, 2021

1:45 AM
goldenshimmer has quit

2021-03-05 06425, 2021

1:45 AM
SamThursfield[m] has quit

2021-03-05 06419, 2021

1:48 AM
lorenzuru has quit

2021-03-05 06419, 2021

1:48 AM
kepstin has quit

2021-03-05 06426, 2021

1:48 AM
joshuaboniface has quit

2021-03-05 06433, 2021

1:51 AM
d4rkie has quit

2021-03-05 06414, 2021

1:52 AM
Nyanko-sensei joined the channel

2021-03-05 06404, 2021

1:53 AM
Cyna[m] joined the channel

2021-03-05 06410, 2021

1:54 AM
SamThursfield[m] joined the channel

2021-03-05 06403, 2021

2:11 AM
goldenshimmer joined the channel

2021-03-05 06456, 2021

2:25 AM
lorenzuru joined the channel

2021-03-05 06419, 2021

2:40 AM
kepstin joined the channel

2021-03-05 06422, 2021

2:42 AM
MajorLurker joined the channel

2021-03-05 06407, 2021

2:44 AM
joshuaboniface joined the channel

2021-03-05 06450, 2021

2:46 AM
MajorLurker has quit

2021-03-05 06439, 2021

3:36 AM
Rohan_Pillai joined the channel

2021-03-05 06403, 2021

4:20 AM
Rohan_Pillai has quit

2021-03-05 06440, 2021

4:28 AM
Rohan_Pillai joined the channel

2021-03-05 06458, 2021

4:58 AM
Rohan_Pillai has quit

2021-03-05 06409, 2021

5:08 AM
Rohan_Pillai joined the channel

2021-03-05 06411, 2021

5:30 AM
sumedh joined the channel

2021-03-05 06453, 2021

5:30 AM
Rohan_Pillai has quit

2021-03-05 06428, 2021

5:55 AM
sumedh has quit

2021-03-05 06443, 2021

6:11 AM
Rohan_Pillai joined the channel

2021-03-05 06412, 2021

6:31 AM
d4rkie joined the channel

2021-03-05 06437, 2021

6:34 AM
Nyanko-sensei has quit

2021-03-05 06420, 2021

6:44 AM
MajorLurker joined the channel

2021-03-05 06454, 2021

6:48 AM
MajorLurker has quit

2021-03-05 06421, 2021

6:59 AM
Rohan_Pillai has quit

2021-03-05 06457, 2021

7:19 AM
_lucifer

ruaok: import failed. no diskspace left on device.

2021-03-05 06438, 2021

7:54 AM
sumedh joined the channel

2021-03-05 06452, 2021

8:32 AM
sampsyo has quit

2021-03-05 06440, 2021

8:34 AM
sampsyo joined the channel

2021-03-05 06412, 2021

8:41 AM
sumedh has quit

2021-03-05 06446, 2021

8:45 AM
Rohan_Pillai joined the channel

2021-03-05 06404, 2021

9:36 AM
ruaok

Mooooiin!

2021-03-05 06424, 2021

9:36 AM
ruaok

_lucifer: any idea how to clean up?

2021-03-05 06436, 2021

9:36 AM
zas

bitmap: postgres-williams on paco needs more diskspace, it should go back to williams imho (and few containers on williams should prolly run on paco instead)

2021-03-05 06405, 2021

9:39 AM
zas

bitmap: I truncated pg log file on floyd, we still need to restart docker to control log file size there

2021-03-05 06430, 2021

9:39 AM
Rohan_Pillai has quit

2021-03-05 06449, 2021

9:39 AM
zas

log file was doing 172Gb

2021-03-05 06411, 2021

9:40 AM
_lucifer

ruaok: what's the size of the dump? we could clear the incomplete dumps and other things from hdfs.

2021-03-05 06441, 2021

9:40 AM
_lucifer

that drive has 216G at max available and docker is using that images and other containers as well

2021-03-05 06446, 2021

9:41 AM
_lucifer

just a docker prune can yield ~20G. clearing the temp files and incomplete dump should yield another ~125G

2021-03-05 06455, 2021

9:43 AM
_lucifer

how much disk space do other nodes in the cluster have?

2021-03-05 06447, 2021

9:45 AM
ruaok

they should all have the same specs.

2021-03-05 06451, 2021

9:45 AM
ruaok

let's clean up then!

2021-03-05 06454, 2021

9:45 AM
ruaok

do you know how?

2021-03-05 06441, 2021

9:48 AM
_lucifer

hdfs -rm -r -skipTrash `path` inside the namenode should do that

2021-03-05 06444, 2021

9:48 AM
_lucifer

let me try it

2021-03-05 06433, 2021

9:59 AM
_lucifer

ruaok, i am trying to delete but there are some issues with namenode. can take some time to diagnose.

2021-03-05 06447, 2021

9:59 AM
_lucifer

in the meanwhile, you might want to take a look at this

2021-03-05 06403, 2021

10:00 AM
_lucifer

https://www.irccloud.com/pastebin/Z08AqwlK/

2021-03-05 06404, 2021

10:00 AM
ruaok

why if we just reformatted our HDFS and start over?

2021-03-05 06417, 2021

10:00 AM
_lucifer

it seems all other datanodes are offline

2021-03-05 06426, 2021

10:00 AM
ruaok

that is supposed to be a valid use case. we need to reimport all the data.

2021-03-05 06455, 2021

10:00 AM
ruaok

it really sounds like the cluster needs a complete reboot. so let's do that.

2021-03-05 06422, 2021

10:01 AM
_lucifer

yeah let's try that

2021-03-05 06408, 2021

10:03 AM
_lucifer

did something happen on March 2? new datanode containers came up on leader that day and other workers went offline the same day

2021-03-05 06444, 2021

10:03 AM
ruaok

not that i know of, but the problem is that these systems aren't monitored, so hard to know.

2021-03-05 06400, 2021

10:04 AM
_lucifer

yeah :(

2021-03-05 06401, 2021

10:04 AM
ruaok

I'm really warming up to your suggestion of using yarn and not docker to run the cluster.

2021-03-05 06425, 2021

10:04 AM
ruaok

once we are able to do that, then let's get 4 new machines, have zas monitor them and restart the cluster.

2021-03-05 06443, 2021

10:04 AM
_lucifer

yeah makes sense

2021-03-05 06411, 2021

10:06 AM
ruaok

ok, 11G free.

2021-03-05 06455, 2021

10:06 AM
ruaok

ok, cluster stopped.

2021-03-05 06442, 2021

10:08 AM
ruaok

name node volumes dropped, recreated. now to do that to each datanode.

2021-03-05 06414, 2021

10:20 AM
d4rkie has quit

2021-03-05 06445, 2021

10:23 AM
ruaok

_lucifer: ok, reset complete. can you see if the cluster looks healthy?

2021-03-05 06423, 2021

10:24 AM
_lucifer

on it

2021-03-05 06430, 2021

10:26 AM
_lucifer

https://www.irccloud.com/pastebin/P6NTsxCj/

2021-03-05 06439, 2021

10:26 AM
_lucifer

yup all datanodes are up

2021-03-05 06454, 2021

10:26 AM
ruaok

yay. lets start the dump import anew

2021-03-05 06459, 2021

10:26 AM
_lucifer

yup

2021-03-05 06416, 2021

10:29 AM
ruaok

ehhh uhm.

2021-03-05 06424, 2021

10:29 AM
ruaok

look at the logs of the request consumer.

2021-03-05 06433, 2021

10:29 AM
ruaok

seems that its trying to clean up stuff that doesn't exist.

2021-03-05 06450, 2021

10:29 AM
ruaok

now trying to erase listens from the medival times.

2021-03-05 06457, 2021

10:29 AM
ruaok

1500s. :)

2021-03-05 06456, 2021

10:30 AM
ruaok

lets see if it stops when it gets to baby jesus.

2021-03-05 06418, 2021

10:31 AM
_lucifer

lol

2021-03-05 06403, 2021

10:32 AM
ruaok

or its trying to execute an old task in the queue. list generate dataframes, but no data present.

2021-03-05 06459, 2021

10:33 AM
sumedh joined the channel

2021-03-05 06438, 2021

10:36 AM
ruaok

nope.

2021-03-05 06451, 2021

10:36 AM
ruaok

I cleared the queue and re-entered the import command.

2021-03-05 06412, 2021

10:37 AM
ruaok

we need to debug why its doing what its doing. :(

2021-03-05 06454, 2021

10:37 AM
Rohan_Pillai joined the channel

2021-03-05 06443, 2021

10:39 AM
ruaok

ah. its trying to calculate stats.

2021-03-05 06447, 2021

10:39 AM
ruaok

maybe the queue purge failed.

2021-03-05 06433, 2021

10:42 AM
Rohan_Pillai has quit

2021-03-05 06448, 2021

10:42 AM
ruaok

yvanzo: ping

2021-03-05 06432, 2021

10:45 AM
ruaok

confirmed queue not purged.

2021-03-05 06455, 2021

10:45 AM
ruaok

previous magic to do so doesn't work on new install. need assistance from yvanzo

2021-03-05 06406, 2021

10:46 AM
yvanzo

ruaok: pong

2021-03-05 06411, 2021

10:46 AM
ruaok

hiya!

2021-03-05 06411, 2021

10:46 AM
MajorLurker joined the channel

2021-03-05 06431, 2021

10:46 AM
ruaok

I'm trying to purge a listenbrainz queue, but it doesn't seem to work right.

2021-03-05 06441, 2021

10:46 AM
ruaok

where is rabbitmqadmin installed on prince?

2021-03-05 06455, 2021

10:46 AM
ruaok

I copied over the file from trille, which may be the first mistake.

2021-03-05 06411, 2021

10:47 AM
yvanzo

on trille: docker cp rabbitmq-prince:/usr/local/bin/rabbitmqadmin .

2021-03-05 06416, 2021

10:47 AM
yvanzo

oops, on prince ^

2021-03-05 06438, 2021

10:47 AM
yvanzo

this is not the same version of rabbitmq(admin)

2021-03-05 06452, 2021

10:47 AM
ruaok

ok, same error persists

2021-03-05 06402, 2021

10:48 AM
ruaok

see pm

2021-03-05 06436, 2021

10:50 AM
MajorLurker has quit

2021-03-05 06432, 2021

10:53 AM
ruaok

https://www.irccloud.com/pastebin/Sxv18hfM/

2021-03-05 06443, 2021

10:53 AM
ruaok

lolfuss.

2021-03-05 06446, 2021

10:53 AM
ruaok

_lucifer: ^^

2021-03-05 06459, 2021

10:53 AM
_lucifer

👏

2021-03-05 06454, 2021

11:35 AM
_lucifer

ruaok: i am trying to figure why it went on looking way back in the past for listens. ideally it should go from the start of the range to its end. which job was in the queue when you cleared it?

2021-03-05 06413, 2021

11:36 AM
ruaok

it was a stats job that was working.

2021-03-05 06442, 2021

11:36 AM
ruaok

I forget the exact one, but we can guess that it was the first one running according to the daily crontab

2021-03-05 06445, 2021

11:36 AM
_lucifer

👍

2021-03-05 06419, 2021

11:40 AM
_lucifer

the sentry stack trace is sparticularly unhelpful :(

2021-03-05 06426, 2021

11:42 AM
ruaok

zas: https://www.cloudflare.com/galileo/

2021-03-05 06433, 2021

11:42 AM
ruaok

what do you think?

2021-03-05 06413, 2021

11:49 AM
Nyanko-sensei joined the channel

2021-03-05 06406, 2021

11:54 AM
_lucifer

ruaok: i am unable to debug the issue using the info present in sentry. is it possible to view the logs of the request conusmer before it was restarted and also is it fine if i change spark logging level for sentry to debug?

2021-03-05 06433, 2021

11:54 AM
ruaok

sorry no, in order to free diskspace, I purged old containers. :(

2021-03-05 06432, 2021

11:55 AM
_lucifer

no i mean the logs of the container you started after that but before clearing the queue

2021-03-05 06412, 2021

11:56 AM
ruaok

one sec. let me finish this €66,000 task real quick.

2021-03-05 06425, 2021

11:56 AM
_lucifer

sure

2021-03-05 06427, 2021

11:57 AM
reosarevok

ruaok: dunno if you saw the mail to (I assume) modbot?

2021-03-05 06430, 2021

11:57 AM
reosarevok

https://www.irccloud.com/pastebin/MUKiKxRv/

2021-03-05 06442, 2021

11:57 AM
ruaok

reosarevok: see about 10 lines above. :)

2021-03-05 06447, 2021

11:57 AM
reosarevok

Oh

2021-03-05 06457, 2021

11:57 AM
reosarevok

I guess you did :D

2021-03-05 06441, 2021

11:58 AM
ruaok

can one view logs of a container that is now stopped? anyone know?

2021-03-05 06414, 2021

11:59 AM
mckean_ joined the channel

2021-03-05 06448, 2021

11:59 AM
mckean has quit

2021-03-05 06400, 2021

12:01 PM
atj

yes

2021-03-05 06428, 2021

12:01 PM
ruaok

correction, its been deleted

2021-03-05 06441, 2021

12:01 PM
ruaok

it not longer appears in docker ps --all

2021-03-05 06444, 2021

12:01 PM
atj

ah, in that case no, depending on the logging configuration

2021-03-05 06402, 2021

12:02 PM
atj

it sounds like you use file, in which case it got deleted with the container

2021-03-05 06410, 2021

12:02 PM
ruaok

yeah.

2021-03-05 06455, 2021

12:02 PM
ruaok

_lucifer: I suppose we can reproduce it sometime. once things calm down, we can reset the cluster again and try to generate stats.

2021-03-05 06428, 2021

12:03 PM
_lucifer

yeah, sure. let's change the logging level to debug before that.

2021-03-05 06430, 2021

12:03 PM
ZaphodBeeblebrox is now known as CatQuest

2021-03-05 06457, 2021

12:03 PM
_lucifer

that should show up some more useful data in sentry.

2021-03-05 06405, 2021

12:04 PM
ruaok

what it did appear to be doing was to look for data and when it found none, the exit condition was never reached.

2021-03-05 06406, 2021

12:04 PM
shivam-kapila

> now trying to erase listens from the medival times.

2021-03-05 06406, 2021

12:04 PM
shivam-kapila

Yiikes.

2021-03-05 06410, 2021

12:26 PM
_lucifer

ruaok: figured it out. `get_latest_listens_ts` is the culprit and every job recs or stats calls that.

2021-03-05 06455, 2021

12:26 PM
_lucifer

we use it to find the `to_date` of the time range