#metabrainz

/

      • Etua joined the channel
      • 2021-03-05 06414, 2021

      • Etua has quit
      • 2021-03-05 06428, 2021

      • Sophist_UK joined the channel
      • 2021-03-05 06458, 2021

      • Sophist-UK has quit
      • 2021-03-05 06425, 2021

      • Cyna[m] has quit
      • 2021-03-05 06425, 2021

      • goldenshimmer has quit
      • 2021-03-05 06425, 2021

      • SamThursfield[m] has quit
      • 2021-03-05 06419, 2021

      • lorenzuru has quit
      • 2021-03-05 06419, 2021

      • kepstin has quit
      • 2021-03-05 06426, 2021

      • joshuaboniface has quit
      • 2021-03-05 06433, 2021

      • d4rkie has quit
      • 2021-03-05 06414, 2021

      • Nyanko-sensei joined the channel
      • 2021-03-05 06404, 2021

      • Cyna[m] joined the channel
      • 2021-03-05 06410, 2021

      • SamThursfield[m] joined the channel
      • 2021-03-05 06403, 2021

      • goldenshimmer joined the channel
      • 2021-03-05 06456, 2021

      • lorenzuru joined the channel
      • 2021-03-05 06419, 2021

      • kepstin joined the channel
      • 2021-03-05 06422, 2021

      • MajorLurker joined the channel
      • 2021-03-05 06407, 2021

      • joshuaboniface joined the channel
      • 2021-03-05 06450, 2021

      • MajorLurker has quit
      • 2021-03-05 06439, 2021

      • Rohan_Pillai joined the channel
      • 2021-03-05 06403, 2021

      • Rohan_Pillai has quit
      • 2021-03-05 06440, 2021

      • Rohan_Pillai joined the channel
      • 2021-03-05 06458, 2021

      • Rohan_Pillai has quit
      • 2021-03-05 06409, 2021

      • Rohan_Pillai joined the channel
      • 2021-03-05 06411, 2021

      • sumedh joined the channel
      • 2021-03-05 06453, 2021

      • Rohan_Pillai has quit
      • 2021-03-05 06428, 2021

      • sumedh has quit
      • 2021-03-05 06443, 2021

      • Rohan_Pillai joined the channel
      • 2021-03-05 06412, 2021

      • d4rkie joined the channel
      • 2021-03-05 06437, 2021

      • Nyanko-sensei has quit
      • 2021-03-05 06420, 2021

      • MajorLurker joined the channel
      • 2021-03-05 06454, 2021

      • MajorLurker has quit
      • 2021-03-05 06421, 2021

      • Rohan_Pillai has quit
      • 2021-03-05 06457, 2021

      • _lucifer
        ruaok: import failed. no diskspace left on device.
      • 2021-03-05 06438, 2021

      • sumedh joined the channel
      • 2021-03-05 06452, 2021

      • sampsyo has quit
      • 2021-03-05 06440, 2021

      • sampsyo joined the channel
      • 2021-03-05 06412, 2021

      • sumedh has quit
      • 2021-03-05 06446, 2021

      • Rohan_Pillai joined the channel
      • 2021-03-05 06404, 2021

      • ruaok
        Mooooiin!
      • 2021-03-05 06424, 2021

      • ruaok
        _lucifer: any idea how to clean up?
      • 2021-03-05 06436, 2021

      • zas
        bitmap: postgres-williams on paco needs more diskspace, it should go back to williams imho (and few containers on williams should prolly run on paco instead)
      • 2021-03-05 06405, 2021

      • zas
        bitmap: I truncated pg log file on floyd, we still need to restart docker to control log file size there
      • 2021-03-05 06430, 2021

      • Rohan_Pillai has quit
      • 2021-03-05 06449, 2021

      • zas
        log file was doing 172Gb
      • 2021-03-05 06411, 2021

      • _lucifer
        ruaok: what's the size of the dump? we could clear the incomplete dumps and other things from hdfs.
      • 2021-03-05 06441, 2021

      • _lucifer
        that drive has 216G at max available and docker is using that images and other containers as well
      • 2021-03-05 06446, 2021

      • _lucifer
        just a docker prune can yield ~20G. clearing the temp files and incomplete dump should yield another ~125G
      • 2021-03-05 06455, 2021

      • _lucifer
        how much disk space do other nodes in the cluster have?
      • 2021-03-05 06447, 2021

      • ruaok
        they should all have the same specs.
      • 2021-03-05 06451, 2021

      • ruaok
        let's clean up then!
      • 2021-03-05 06454, 2021

      • ruaok
        do you know how?
      • 2021-03-05 06441, 2021

      • _lucifer
        hdfs -rm -r -skipTrash `path` inside the namenode should do that
      • 2021-03-05 06444, 2021

      • _lucifer
        let me try it
      • 2021-03-05 06433, 2021

      • _lucifer
        ruaok, i am trying to delete but there are some issues with namenode. can take some time to diagnose.
      • 2021-03-05 06447, 2021

      • _lucifer
        in the meanwhile, you might want to take a look at this
      • 2021-03-05 06403, 2021

      • _lucifer
      • 2021-03-05 06404, 2021

      • ruaok
        why if we just reformatted our HDFS and start over?
      • 2021-03-05 06417, 2021

      • _lucifer
        it seems all other datanodes are offline
      • 2021-03-05 06426, 2021

      • ruaok
        that is supposed to be a valid use case. we need to reimport all the data.
      • 2021-03-05 06455, 2021

      • ruaok
        it really sounds like the cluster needs a complete reboot. so let's do that.
      • 2021-03-05 06422, 2021

      • _lucifer
        yeah let's try that
      • 2021-03-05 06408, 2021

      • _lucifer
        did something happen on March 2? new datanode containers came up on leader that day and other workers went offline the same day
      • 2021-03-05 06444, 2021

      • ruaok
        not that i know of, but the problem is that these systems aren't monitored, so hard to know.
      • 2021-03-05 06400, 2021

      • _lucifer
        yeah :(
      • 2021-03-05 06401, 2021

      • ruaok
        I'm really warming up to your suggestion of using yarn and not docker to run the cluster.
      • 2021-03-05 06425, 2021

      • ruaok
        once we are able to do that, then let's get 4 new machines, have zas monitor them and restart the cluster.
      • 2021-03-05 06443, 2021

      • _lucifer
        yeah makes sense
      • 2021-03-05 06411, 2021

      • ruaok
        ok, 11G free.
      • 2021-03-05 06455, 2021

      • ruaok
        ok, cluster stopped.
      • 2021-03-05 06442, 2021

      • ruaok
        name node volumes dropped, recreated. now to do that to each datanode.
      • 2021-03-05 06414, 2021

      • d4rkie has quit
      • 2021-03-05 06445, 2021

      • ruaok
        _lucifer: ok, reset complete. can you see if the cluster looks healthy?
      • 2021-03-05 06423, 2021

      • _lucifer
        on it
      • 2021-03-05 06430, 2021

      • _lucifer
      • 2021-03-05 06439, 2021

      • _lucifer
        yup all datanodes are up
      • 2021-03-05 06454, 2021

      • ruaok
        yay. lets start the dump import anew
      • 2021-03-05 06459, 2021

      • _lucifer
        yup
      • 2021-03-05 06416, 2021

      • ruaok
        ehhh uhm.
      • 2021-03-05 06424, 2021

      • ruaok
        look at the logs of the request consumer.
      • 2021-03-05 06433, 2021

      • ruaok
        seems that its trying to clean up stuff that doesn't exist.
      • 2021-03-05 06450, 2021

      • ruaok
        now trying to erase listens from the medival times.
      • 2021-03-05 06457, 2021

      • ruaok
        1500s. :)
      • 2021-03-05 06456, 2021

      • ruaok
        lets see if it stops when it gets to baby jesus.
      • 2021-03-05 06418, 2021

      • _lucifer
        lol
      • 2021-03-05 06403, 2021

      • ruaok
        or its trying to execute an old task in the queue. list generate dataframes, but no data present.
      • 2021-03-05 06459, 2021

      • sumedh joined the channel
      • 2021-03-05 06438, 2021

      • ruaok
        nope.
      • 2021-03-05 06451, 2021

      • ruaok
        I cleared the queue and re-entered the import command.
      • 2021-03-05 06412, 2021

      • ruaok
        we need to debug why its doing what its doing. :(
      • 2021-03-05 06454, 2021

      • Rohan_Pillai joined the channel
      • 2021-03-05 06443, 2021

      • ruaok
        ah. its trying to calculate stats.
      • 2021-03-05 06447, 2021

      • ruaok
        maybe the queue purge failed.
      • 2021-03-05 06433, 2021

      • Rohan_Pillai has quit
      • 2021-03-05 06448, 2021

      • ruaok
        yvanzo: ping
      • 2021-03-05 06432, 2021

      • ruaok
        confirmed queue not purged.
      • 2021-03-05 06455, 2021

      • ruaok
        previous magic to do so doesn't work on new install. need assistance from yvanzo
      • 2021-03-05 06406, 2021

      • yvanzo
        ruaok: pong
      • 2021-03-05 06411, 2021

      • ruaok
        hiya!
      • 2021-03-05 06411, 2021

      • MajorLurker joined the channel
      • 2021-03-05 06431, 2021

      • ruaok
        I'm trying to purge a listenbrainz queue, but it doesn't seem to work right.
      • 2021-03-05 06441, 2021

      • ruaok
        where is rabbitmqadmin installed on prince?
      • 2021-03-05 06455, 2021

      • ruaok
        I copied over the file from trille, which may be the first mistake.
      • 2021-03-05 06411, 2021

      • yvanzo
        on trille: docker cp rabbitmq-prince:/usr/local/bin/rabbitmqadmin .
      • 2021-03-05 06416, 2021

      • yvanzo
        oops, on prince ^
      • 2021-03-05 06438, 2021

      • yvanzo
        this is not the same version of rabbitmq(admin)
      • 2021-03-05 06452, 2021

      • ruaok
        ok, same error persists
      • 2021-03-05 06402, 2021

      • ruaok
        see pm
      • 2021-03-05 06436, 2021

      • MajorLurker has quit
      • 2021-03-05 06432, 2021

      • ruaok
      • 2021-03-05 06443, 2021

      • ruaok
        lolfuss.
      • 2021-03-05 06446, 2021

      • ruaok
        _lucifer: ^^
      • 2021-03-05 06459, 2021

      • _lucifer
        👏
      • 2021-03-05 06454, 2021

      • _lucifer
        ruaok: i am trying to figure why it went on looking way back in the past for listens. ideally it should go from the start of the range to its end. which job was in the queue when you cleared it?
      • 2021-03-05 06413, 2021

      • ruaok
        it was a stats job that was working.
      • 2021-03-05 06442, 2021

      • ruaok
        I forget the exact one, but we can guess that it was the first one running according to the daily crontab
      • 2021-03-05 06445, 2021

      • _lucifer
        👍
      • 2021-03-05 06419, 2021

      • _lucifer
        the sentry stack trace is sparticularly unhelpful :(
      • 2021-03-05 06426, 2021

      • ruaok
      • 2021-03-05 06433, 2021

      • ruaok
        what do you think?
      • 2021-03-05 06413, 2021

      • Nyanko-sensei joined the channel
      • 2021-03-05 06406, 2021

      • _lucifer
        ruaok: i am unable to debug the issue using the info present in sentry. is it possible to view the logs of the request conusmer before it was restarted and also is it fine if i change spark logging level for sentry to debug?
      • 2021-03-05 06433, 2021

      • ruaok
        sorry no, in order to free diskspace, I purged old containers. :(
      • 2021-03-05 06432, 2021

      • _lucifer
        no i mean the logs of the container you started after that but before clearing the queue
      • 2021-03-05 06412, 2021

      • ruaok
        one sec. let me finish this €66,000 task real quick.
      • 2021-03-05 06425, 2021

      • _lucifer
        sure
      • 2021-03-05 06427, 2021

      • reosarevok
        ruaok: dunno if you saw the mail to (I assume) modbot?
      • 2021-03-05 06430, 2021

      • reosarevok
      • 2021-03-05 06442, 2021

      • ruaok
        reosarevok: see about 10 lines above. :)
      • 2021-03-05 06447, 2021

      • reosarevok
        Oh
      • 2021-03-05 06457, 2021

      • reosarevok
        I guess you did :D
      • 2021-03-05 06441, 2021

      • ruaok
        can one view logs of a container that is now stopped? anyone know?
      • 2021-03-05 06414, 2021

      • mckean_ joined the channel
      • 2021-03-05 06448, 2021

      • mckean has quit
      • 2021-03-05 06400, 2021

      • atj
        yes
      • 2021-03-05 06428, 2021

      • ruaok
        correction, its been deleted
      • 2021-03-05 06441, 2021

      • ruaok
        it not longer appears in docker ps --all
      • 2021-03-05 06444, 2021

      • atj
        ah, in that case no, depending on the logging configuration
      • 2021-03-05 06402, 2021

      • atj
        it sounds like you use file, in which case it got deleted with the container
      • 2021-03-05 06410, 2021

      • ruaok
        yeah.
      • 2021-03-05 06455, 2021

      • ruaok
        _lucifer: I suppose we can reproduce it sometime. once things calm down, we can reset the cluster again and try to generate stats.
      • 2021-03-05 06428, 2021

      • _lucifer
        yeah, sure. let's change the logging level to debug before that.
      • 2021-03-05 06430, 2021

      • ZaphodBeeblebrox is now known as CatQuest
      • 2021-03-05 06457, 2021

      • _lucifer
        that should show up some more useful data in sentry.
      • 2021-03-05 06405, 2021

      • ruaok
        what it did appear to be doing was to look for data and when it found none, the exit condition was never reached.
      • 2021-03-05 06406, 2021

      • shivam-kapila
        > now trying to erase listens from the medival times.
      • 2021-03-05 06406, 2021

      • shivam-kapila
        Yiikes.
      • 2021-03-05 06410, 2021

      • _lucifer
        ruaok: figured it out. `get_latest_listens_ts` is the culprit and every job recs or stats calls that.
      • 2021-03-05 06455, 2021

      • _lucifer
        we use it to find the `to_date` of the time range