#metabrainz

/

      • d4rkie has quit
      • 2021-02-12 04317, 2021

      • Nyanko-sensei joined the channel
      • 2021-02-12 04351, 2021

      • davic has quit
      • 2021-02-12 04303, 2021

      • sumedh joined the channel
      • 2021-02-12 04336, 2021

      • sumedh has quit
      • 2021-02-12 04321, 2021

      • sumedh joined the channel
      • 2021-02-12 04332, 2021

      • adhawkins_ joined the channel
      • 2021-02-12 04330, 2021

      • adhawkins has quit
      • 2021-02-12 04342, 2021

      • adhawkins_ is now known as adhawkins
      • 2021-02-12 04324, 2021

      • alastairp
        careful about blindly applying machine learning: https://twitter.com/GreatDismal/status/1360129438…
      • 2021-02-12 04308, 2021

      • v6lur joined the channel
      • 2021-02-12 04356, 2021

      • sumedh has quit
      • 2021-02-12 04346, 2021

      • Mr_Monkey
        <sarcasm>Ahhh, after hearing for about for 10 years that the blockchain technology will revolutionize the world, it's finally happening! https://v.cent.co </sarcasm>
      • 2021-02-12 04330, 2021

      • ruaok
      • 2021-02-12 04312, 2021

      • ruaok
        thankfully the blockchain nonsense is starting to die down. I can't wait to see what real applications will come out of it beyond planet destroying crypto currencies.
      • 2021-02-12 04331, 2021

      • ruaok
        because public ledgers are really cool.
      • 2021-02-12 04302, 2021

      • Gazooo794944007 has quit
      • 2021-02-12 04345, 2021

      • Gazooo794944007 joined the channel
      • 2021-02-12 04345, 2021

      • BrainzGit
        [critiquebrainz] amCap1712 opened pull request #344 (master…namespace): Invalidate individual cache keys instead of the entire namespace https://github.com/metabrainz/critiquebrainz/pull…
      • 2021-02-12 04343, 2021

      • ruaok
        alastairp: when you get a chance, lets chat about https://github.com/metabrainz/listenbrainz-server…
      • 2021-02-12 04322, 2021

      • alastairp
        ruaok: I have time between now and 12
      • 2021-02-12 04349, 2021

      • sumedh joined the channel
      • 2021-02-12 04349, 2021

      • alastairp
        do you have logs of dumps that don't clean up after themselves?
      • 2021-02-12 04328, 2021

      • alastairp
        I guess this PR is also introducing logging to a volume, so this should give us more visibility into what's happening
      • 2021-02-12 04310, 2021

      • alastairp
        from what I understand from what you're saying, there appears to be _some_ code path that is failing but not cleaning up the temp directory?
      • 2021-02-12 04346, 2021

      • ruaok
        no. yes.
      • 2021-02-12 04314, 2021

      • ruaok
        I have logs, but they show that a previous run left a dirty dump around. which a subsequent run rsynced to ftp.
      • 2021-02-12 04345, 2021

      • alastairp
        where can I see those logs?
      • 2021-02-12 04345, 2021

      • ruaok
        problem is, I can't tell if this was because we had two cron jobs running or some other problem.
      • 2021-02-12 04357, 2021

      • ruaok
        lemmy:~->docker exec -it listenbrainz-cron-prod less /logs/dumps.log
      • 2021-02-12 04339, 2021

      • ruaok
        given that the cleanup of create_dumps.sh was actually ok, it may be more of a case of two containers running dumps.
      • 2021-02-12 04308, 2021

      • ruaok
        that then begs the question that you were getting at earlier: how do we prevent cron from runing more than once?
      • 2021-02-12 04328, 2021

      • ruaok
        its not as easy as putting lock files into a volume. containers can run on different hosts.
      • 2021-02-12 04331, 2021

      • alastairp
        but these are jobs that run twice a month?
      • 2021-02-12 04338, 2021

      • BrainzGit
        [brainzutils-python] amCap1712 opened pull request #52 (master…namespace-version): Remove cache namespace versioning and invalidation feature https://github.com/metabrainz/brainzutils-python/…
      • 2021-02-12 04303, 2021

      • ruaok
        no, these jobs are run for full dumps twice a month. followed by an import of the dump 24 hours later.
      • 2021-02-12 04324, 2021

      • ruaok
        same cron tab also does daily incremental dumps, with a followed call to import the daily dumps.
      • 2021-02-12 04336, 2021

      • alastairp
        right
      • 2021-02-12 04346, 2021

      • ruaok
        see the options to flock in this case. the dumps commands never block. the import commands do.
      • 2021-02-12 04355, 2021

      • ruaok
        because dumps means we have two running.
      • 2021-02-12 04356, 2021

      • alastairp
        can you point to an item in the log file which shows that bad items are being copied?
      • 2021-02-12 04308, 2021

      • ruaok
        import not running is likely because dumps are taking longer than they should.
      • 2021-02-12 04320, 2021

      • ruaok
        alastairp: as I said, that is not in the logs.
      • 2021-02-12 04347, 2021

      • ruaok
        I guess adding a -v flag to rsync would help here, if we don't hav eit.
      • 2021-02-12 04320, 2021

      • ruaok
        but the real thing I want to talk about is to prevent cron from starting when it shouldn't.
      • 2021-02-12 04321, 2021

      • alastairp
        import not running sounds like an additional feature that we should look at, possibly by triggering import directly after dumps? (or are they on different machines too?)
      • 2021-02-12 04328, 2021

      • alastairp
        on random containers?
      • 2021-02-12 04342, 2021

      • ruaok
        ideally on random containers.
      • 2021-02-12 04359, 2021

      • ruaok
        I *think* we should only ever have cron running in a container named "listenbrainz-cron-prod".
      • 2021-02-12 04305, 2021

      • alastairp
      • 2021-02-12 04311, 2021

      • ruaok
        can we tie the starting of cron to the container name?
      • 2021-02-12 04315, 2021

      • alastairp
        I'm trying to get to this PR again today, I hope we can release it tomorrow
      • 2021-02-12 04318, 2021

      • alastairp
        uh, next week
      • 2021-02-12 04351, 2021

      • alastairp
        let me just look at the current way the startup scripts work, one moment
      • 2021-02-12 04322, 2021

      • ruaok
        rsync has -v, so we should be able to trace the event next time.
      • 2021-02-12 04352, 2021

      • alastairp
        ok, let me just put some thoughts down
      • 2021-02-12 04322, 2021

      • alastairp
        https://github.com/metabrainz/listenbrainz-server… starts cron when the cron container is started
      • 2021-02-12 04346, 2021

      • alastairp
        however, in docker-server-configs we only `start_listenbrainz_cron 'prod' 'v-2021-02-09.2'`
      • 2021-02-12 04303, 2021

      • alastairp
        there's no start_lb_cron for beta
      • 2021-02-12 04357, 2021

      • alastairp
        there is however https://github.com/metabrainz/listenbrainz-server… for test, I'm not sure why this is configured differently. I hope that configuration in test is different enough that it's not trying to run cron jobs
      • 2021-02-12 04308, 2021

      • ruaok
        what I don't like is that anyone can rm the down file to cause havoc. :(
      • 2021-02-12 04337, 2021

      • ruaok
      • 2021-02-12 04347, 2021

      • ruaok
        and this is what I am trying to prevent from happening again.
      • 2021-02-12 04348, 2021

      • alastairp
        one move forward that we're doing is to put all of the rm down files in a single location (rc.local), so it should be easier at a glance to see what is starting up where
      • 2021-02-12 04331, 2021

      • alastairp
        I'm tempted to add that if cron running in test is causing real dumps to be clobbered, then that's a clear configuration issue that we should also improve
      • 2021-02-12 04341, 2021

      • ruaok
        "I'm tempted to add that if cron running in test is causing real dumps to be clobbered, then that's a clear configuration issue that we should also improve" my point exactly with my comment "anyone can rm down".
      • 2021-02-12 04301, 2021

      • alastairp
        with PR 1237 there will be only one file that has rm down. if it exists in any other location then it's a clear bug.
      • 2021-02-12 04330, 2021

      • alastairp
        but I was trying to say that dump config in test should be crippled enough so that it doesn't rsync to the main dumps dir even if cron is configured and runs
      • 2021-02-12 04316, 2021

      • ruaok
        how do you propose to do that?
      • 2021-02-12 04353, 2021

      • alastairp
        with the new consul setup we can render admin/config.sh depending on which environment we're running on
      • 2021-02-12 04315, 2021

      • ruaok
        and we could add a DO_NOT_RSYNC_ANYTHING_MOFO setting there and check it in create_dumps or so?
      • 2021-02-12 04320, 2021

      • alastairp
        oh, it's already rendered with consul, that's great
      • 2021-02-12 04325, 2021

      • alastairp
        yeah exactly
      • 2021-02-12 04334, 2021

      • ruaok
        ok, that seems like a good solution.
      • 2021-02-12 04339, 2021

      • alastairp
      • 2021-02-12 04345, 2021

      • alastairp
        doesn't have an environment!
      • 2021-02-12 04326, 2021

      • ruaok
        I guess I'll need to add that.
      • 2021-02-12 04351, 2021

      • ruaok
        I have the current fixes in the prod cron for observation, so it shouldn't happen again on my watch.
      • 2021-02-12 04324, 2021

      • alastairp
        to solve the issue of test running cron jobs, I think fixing config.sh is definitely the first place that changes should be made
      • 2021-02-12 04330, 2021

      • ruaok
        my plan now is to take into account this convo and basically rebase it on your consul branch, to make sure my changes don't trip you up.
      • 2021-02-12 04351, 2021

      • alastairp
        I'm just reading through the dump script now to see if anything else stands out
      • 2021-02-12 04352, 2021

      • ruaok
        agreed, I'll add prod and set the flags accordingly.
      • 2021-02-12 04328, 2021

      • alastairp
        we're still missing a little bit of cron configuration in my consul branch, but I hope to get that done today
      • 2021-02-12 04356, 2021

      • ruaok
        ok, with me pushing my stuff to the consul branch and closing this PR?
      • 2021-02-12 04327, 2021

      • alastairp
        the main change there is going to be `flock` in the cronfile?
      • 2021-02-12 04340, 2021

      • alastairp
        and we'll add the deploy env to the config rendering
      • 2021-02-12 04342, 2021

      • alastairp
        will you keep the calls to `finish`?
      • 2021-02-12 04308, 2021

      • ruaok
        yes. yes. no.
      • 2021-02-12 04313, 2021

      • alastairp
        👍
      • 2021-02-12 04320, 2021

      • ruaok
        k, thanks.
      • 2021-02-12 04335, 2021

      • alastairp
        from your pr: "Then how is it possible that this script leaves a dirty dump around when it fails? ", where is this dump left?
      • 2021-02-12 04345, 2021

      • ruaok
        the holding dir that gets rsync'ed to FTP.
      • 2021-02-12 04306, 2021

      • alastairp
        just thinking out loud. what's the behaviour of rm on a directory if a process has an open filehandle on a file in it?
      • 2021-02-12 04357, 2021

      • v6lur has quit
      • 2021-02-12 04334, 2021

      • ruaok
        depends on how the file was opened. but in our case we don't do anything special (like exclusive locks and such) so I would expect the command to fail?
      • 2021-02-12 04352, 2021

      • alastairp
        I'm just tracing through variables in the dump scripts
      • 2021-02-12 04350, 2021

      • alastairp
        config.sh TEMP_DIR comes from consul create-dumps-config.json/%s/temp_dir, create-dumps TMP_DIR comes from `mktemp` with TEMP_DIR as a base
      • 2021-02-12 04315, 2021

      • ruaok
        nope. I just checked it. I opened a file using open in the python repl and kept it open.
      • 2021-02-12 04325, 2021

      • ruaok
        then I rm -rf'ed the dir.
      • 2021-02-12 04326, 2021

      • ruaok
        dir gone.
      • 2021-02-12 04300, 2021

      • ruaok
        the file descriptor kicks around until the file is closed.
      • 2021-02-12 04309, 2021

      • ruaok
        so, this process should not leave anything laying about.
      • 2021-02-12 04326, 2021

      • alastairp
        manage.py dump writes to TMPDIR. create-dumps looks for DUMP_ID.txt inside $TMPDIR and calls that $DUMP_ID_FILE. $DUMP_DIR is the dir that DUMP_ID.txt is in
      • 2021-02-12 04350, 2021

      • alastairp
        then, create-dumps copies the dumps to $BACKUP_DIR (from consul) using `cp`, and it also copies dumps to $FTP_CURRENT_DUMP_DIR, again using `cp`
      • 2021-02-12 04323, 2021

      • alastairp
        it deletes old dumps from FTP_DIR, BACKUP_DIR, and TEMP_DIR using python manage.py
      • 2021-02-12 04306, 2021

      • alastairp
        then it calls rsync-dump-files, which copies from RSYNC_FULLEXPORT_DIR, which is configured in config.sh as FTP_DIR
      • 2021-02-12 04334, 2021

      • alastairp
        ruaok: question: "the holding dir that gets rsync'd to FTP". Do you mean create-dumps-config.json/temp_dir, or create-dumps-config.json/ftp_dir
      • 2021-02-12 04357, 2021

      • alastairp
        because the cleanup deletes the dump from temp_dir, after it copies it to ftp_dir
      • 2021-02-12 04306, 2021

      • alastairp
        so files hanging around in ftp_dir is by design
      • 2021-02-12 04331, 2021

      • ruaok
        I don't know -- it wasn't clear from the logs i had.
      • 2021-02-12 04319, 2021

      • alastairp
        ok, great. that'd be a good thing to check when we get more visibility into the logs
      • 2021-02-12 04323, 2021

      • ruaok
        the thing is -- everything looks watertight when looking at it in the context of one single process running. zas and I tuple checked everything.
      • 2021-02-12 04330, 2021

      • ruaok
        triple
      • 2021-02-12 04312, 2021

      • ruaok
        given that this code has now been reviewed three times, it is looking more likely that we're looking at odd race conditions between the two cron tabs that were running.
      • 2021-02-12 04343, 2021

      • alastairp
        are empty dump files still in storage? or did you rm them?
      • 2021-02-12 04353, 2021

      • ruaok
        which means that given the proposed changed in addition to the consul changes, with active logging, it shouldn't happen again, but if it does, then we can see what happened via the logs.
      • 2021-02-12 04303, 2021

      • ruaok
        I cleaned them out.
      • 2021-02-12 04311, 2021

      • alastairp
        I agree. let's coordinate for a release next week with these changes then
      • 2021-02-12 04316, 2021

      • atj
        Possibly unhelpful suggestion. Is it a shell script doing the rsync'ing?
      • 2021-02-12 04327, 2021

      • ruaok
        alastairp: ok, sounds good.
      • 2021-02-12 04331, 2021

      • alastairp
        atj: it is
      • 2021-02-12 04331, 2021

      • ruaok
        atj: yes.
      • 2021-02-12 04349, 2021

      • atj
        You could add "set -x" to get a full trace of what is happening.
      • 2021-02-12 04328, 2021

      • alastairp
        yeah, that's probably a good idea too. I openend a ticket to recommend that we run shellcheck on all of our scripts
      • 2021-02-12 04329, 2021

      • atj
        I often find it useful for debugging shell scripts.
      • 2021-02-12 04343, 2021

      • alastairp
        is set -x a recommendation of shellcheck, or just set -e?
      • 2021-02-12 04351, 2021

      • atj
        set -x is massively verbose
      • 2021-02-12 04356, 2021

      • atj
        Only use it for debugging
      • 2021-02-12 04314, 2021

      • alastairp
        it's a good option to keep in mind if this keeps happening
      • 2021-02-12 04315, 2021

      • atj
        set -e just causes the shell to exit if a command returns a non-zero exit code that isn't explicitly handled
      • 2021-02-12 04330, 2021

      • ruaok
        probably overkill for now. but as alastairp says, for the next round, yes.
      • 2021-02-12 04302, 2021

      • atj
        alastairp: I'd advice "set -u" too as good practice, but it can be confusing
      • 2021-02-12 04307, 2021

      • atj
        *advise
      • 2021-02-12 04316, 2021

      • alastairp
        yeah, looks good too
      • 2021-02-12 04327, 2021

      • atj
        However, use of unset variables in shell scripts can lead to bad consequences (trust me I know) :)
      • 2021-02-12 04341, 2021

      • alastairp
        rm -rf $THEDIR/
      • 2021-02-12 04350, 2021

      • atj
        Yes, indeed!
      • 2021-02-12 04356, 2021

      • atj shudders
      • 2021-02-12 04312, 2021

      • alastairp
        atj: you're always full of good ideas, thanks for jumping in from time to time!
      • 2021-02-12 04342, 2021

      • atj
        alastairp: thanks, I'm hesitant to jump in often as I have so little knowledge of your setup
      • 2021-02-12 04318, 2021

      • atj
        glad it's helpful
      • 2021-02-12 04329, 2021

      • ruaok
        indeed, thanks atj!
      • 2021-02-12 04305, 2021

      • atj
        I'd like to get a bit more involved at some point if possible, as it seems like I could be vaguely useful :)
      • 2021-02-12 04325, 2021

      • ruaok
        you already are!
      • 2021-02-12 04335, 2021

      • ruaok
        but which project interests you?
      • 2021-02-12 04347, 2021

      • atj
        Probably MB interests me the most but also LB.
      • 2021-02-12 04320, 2021

      • atj
        I'm quite interested in the general devops/sysadmin aspect too.
      • 2021-02-12 04304, 2021

      • ruaok
        I'm sure zas could use some help. :)
      • 2021-02-12 04307, 2021

      • atj
        I do a mix of sysadmin/devops and coding at work. My Perl is OK and but Python is quite rusty.
      • 2021-02-12 04314, 2021

      • ruaok
        he's our sysadmin.