<sarcasm>Ahhh, after hearing for about for 10 years that the blockchain technology will revolutionize the world, it's finally happening! https://v.cent.co </sarcasm>
thankfully the blockchain nonsense is starting to die down. I can't wait to see what real applications will come out of it beyond planet destroying crypto currencies.
no, these jobs are run for full dumps twice a month. followed by an import of the dump 24 hours later.
2021-02-12 04324, 2021
ruaok
same cron tab also does daily incremental dumps, with a followed call to import the daily dumps.
2021-02-12 04336, 2021
alastairp
right
2021-02-12 04346, 2021
ruaok
see the options to flock in this case. the dumps commands never block. the import commands do.
2021-02-12 04355, 2021
ruaok
because dumps means we have two running.
2021-02-12 04356, 2021
alastairp
can you point to an item in the log file which shows that bad items are being copied?
2021-02-12 04308, 2021
ruaok
import not running is likely because dumps are taking longer than they should.
2021-02-12 04320, 2021
ruaok
alastairp: as I said, that is not in the logs.
2021-02-12 04347, 2021
ruaok
I guess adding a -v flag to rsync would help here, if we don't hav eit.
2021-02-12 04320, 2021
ruaok
but the real thing I want to talk about is to prevent cron from starting when it shouldn't.
2021-02-12 04321, 2021
alastairp
import not running sounds like an additional feature that we should look at, possibly by triggering import directly after dumps? (or are they on different machines too?)
2021-02-12 04328, 2021
alastairp
on random containers?
2021-02-12 04342, 2021
ruaok
ideally on random containers.
2021-02-12 04359, 2021
ruaok
I *think* we should only ever have cron running in a container named "listenbrainz-cron-prod".
however, in docker-server-configs we only `start_listenbrainz_cron 'prod' 'v-2021-02-09.2'`
2021-02-12 04303, 2021
alastairp
there's no start_lb_cron for beta
2021-02-12 04357, 2021
alastairp
there is however https://github.com/metabrainz/listenbrainz-server… for test, I'm not sure why this is configured differently. I hope that configuration in test is different enough that it's not trying to run cron jobs
2021-02-12 04308, 2021
ruaok
what I don't like is that anyone can rm the down file to cause havoc. :(
and this is what I am trying to prevent from happening again.
2021-02-12 04348, 2021
alastairp
one move forward that we're doing is to put all of the rm down files in a single location (rc.local), so it should be easier at a glance to see what is starting up where
2021-02-12 04331, 2021
alastairp
I'm tempted to add that if cron running in test is causing real dumps to be clobbered, then that's a clear configuration issue that we should also improve
2021-02-12 04341, 2021
ruaok
"I'm tempted to add that if cron running in test is causing real dumps to be clobbered, then that's a clear configuration issue that we should also improve" my point exactly with my comment "anyone can rm down".
2021-02-12 04301, 2021
alastairp
with PR 1237 there will be only one file that has rm down. if it exists in any other location then it's a clear bug.
2021-02-12 04330, 2021
alastairp
but I was trying to say that dump config in test should be crippled enough so that it doesn't rsync to the main dumps dir even if cron is configured and runs
2021-02-12 04316, 2021
ruaok
how do you propose to do that?
2021-02-12 04353, 2021
alastairp
with the new consul setup we can render admin/config.sh depending on which environment we're running on
2021-02-12 04315, 2021
ruaok
and we could add a DO_NOT_RSYNC_ANYTHING_MOFO setting there and check it in create_dumps or so?
2021-02-12 04320, 2021
alastairp
oh, it's already rendered with consul, that's great
I have the current fixes in the prod cron for observation, so it shouldn't happen again on my watch.
2021-02-12 04324, 2021
alastairp
to solve the issue of test running cron jobs, I think fixing config.sh is definitely the first place that changes should be made
2021-02-12 04330, 2021
ruaok
my plan now is to take into account this convo and basically rebase it on your consul branch, to make sure my changes don't trip you up.
2021-02-12 04351, 2021
alastairp
I'm just reading through the dump script now to see if anything else stands out
2021-02-12 04352, 2021
ruaok
agreed, I'll add prod and set the flags accordingly.
2021-02-12 04328, 2021
alastairp
we're still missing a little bit of cron configuration in my consul branch, but I hope to get that done today
2021-02-12 04356, 2021
ruaok
ok, with me pushing my stuff to the consul branch and closing this PR?
2021-02-12 04327, 2021
alastairp
the main change there is going to be `flock` in the cronfile?
2021-02-12 04340, 2021
alastairp
and we'll add the deploy env to the config rendering
2021-02-12 04342, 2021
alastairp
will you keep the calls to `finish`?
2021-02-12 04308, 2021
ruaok
yes. yes. no.
2021-02-12 04313, 2021
alastairp
👍
2021-02-12 04320, 2021
ruaok
k, thanks.
2021-02-12 04335, 2021
alastairp
from your pr: "Then how is it possible that this script leaves a dirty dump around when it fails? ", where is this dump left?
2021-02-12 04345, 2021
ruaok
the holding dir that gets rsync'ed to FTP.
2021-02-12 04306, 2021
alastairp
just thinking out loud. what's the behaviour of rm on a directory if a process has an open filehandle on a file in it?
2021-02-12 04357, 2021
v6lur has quit
2021-02-12 04334, 2021
ruaok
depends on how the file was opened. but in our case we don't do anything special (like exclusive locks and such) so I would expect the command to fail?
2021-02-12 04352, 2021
alastairp
I'm just tracing through variables in the dump scripts
2021-02-12 04350, 2021
alastairp
config.sh TEMP_DIR comes from consul create-dumps-config.json/%s/temp_dir, create-dumps TMP_DIR comes from `mktemp` with TEMP_DIR as a base
2021-02-12 04315, 2021
ruaok
nope. I just checked it. I opened a file using open in the python repl and kept it open.
2021-02-12 04325, 2021
ruaok
then I rm -rf'ed the dir.
2021-02-12 04326, 2021
ruaok
dir gone.
2021-02-12 04300, 2021
ruaok
the file descriptor kicks around until the file is closed.
2021-02-12 04309, 2021
ruaok
so, this process should not leave anything laying about.
2021-02-12 04326, 2021
alastairp
manage.py dump writes to TMPDIR. create-dumps looks for DUMP_ID.txt inside $TMPDIR and calls that $DUMP_ID_FILE. $DUMP_DIR is the dir that DUMP_ID.txt is in
2021-02-12 04350, 2021
alastairp
then, create-dumps copies the dumps to $BACKUP_DIR (from consul) using `cp`, and it also copies dumps to $FTP_CURRENT_DUMP_DIR, again using `cp`
2021-02-12 04323, 2021
alastairp
it deletes old dumps from FTP_DIR, BACKUP_DIR, and TEMP_DIR using python manage.py
2021-02-12 04306, 2021
alastairp
then it calls rsync-dump-files, which copies from RSYNC_FULLEXPORT_DIR, which is configured in config.sh as FTP_DIR
2021-02-12 04334, 2021
alastairp
ruaok: question: "the holding dir that gets rsync'd to FTP". Do you mean create-dumps-config.json/temp_dir, or create-dumps-config.json/ftp_dir
2021-02-12 04357, 2021
alastairp
because the cleanup deletes the dump from temp_dir, after it copies it to ftp_dir
2021-02-12 04306, 2021
alastairp
so files hanging around in ftp_dir is by design
2021-02-12 04331, 2021
ruaok
I don't know -- it wasn't clear from the logs i had.
2021-02-12 04319, 2021
alastairp
ok, great. that'd be a good thing to check when we get more visibility into the logs
2021-02-12 04323, 2021
ruaok
the thing is -- everything looks watertight when looking at it in the context of one single process running. zas and I tuple checked everything.
2021-02-12 04330, 2021
ruaok
triple
2021-02-12 04312, 2021
ruaok
given that this code has now been reviewed three times, it is looking more likely that we're looking at odd race conditions between the two cron tabs that were running.
2021-02-12 04343, 2021
alastairp
are empty dump files still in storage? or did you rm them?
2021-02-12 04353, 2021
ruaok
which means that given the proposed changed in addition to the consul changes, with active logging, it shouldn't happen again, but if it does, then we can see what happened via the logs.
2021-02-12 04303, 2021
ruaok
I cleaned them out.
2021-02-12 04311, 2021
alastairp
I agree. let's coordinate for a release next week with these changes then
2021-02-12 04316, 2021
atj
Possibly unhelpful suggestion. Is it a shell script doing the rsync'ing?
2021-02-12 04327, 2021
ruaok
alastairp: ok, sounds good.
2021-02-12 04331, 2021
alastairp
atj: it is
2021-02-12 04331, 2021
ruaok
atj: yes.
2021-02-12 04349, 2021
atj
You could add "set -x" to get a full trace of what is happening.
2021-02-12 04328, 2021
alastairp
yeah, that's probably a good idea too. I openend a ticket to recommend that we run shellcheck on all of our scripts
2021-02-12 04329, 2021
atj
I often find it useful for debugging shell scripts.
2021-02-12 04343, 2021
alastairp
is set -x a recommendation of shellcheck, or just set -e?
2021-02-12 04351, 2021
atj
set -x is massively verbose
2021-02-12 04356, 2021
atj
Only use it for debugging
2021-02-12 04314, 2021
alastairp
it's a good option to keep in mind if this keeps happening
2021-02-12 04315, 2021
atj
set -e just causes the shell to exit if a command returns a non-zero exit code that isn't explicitly handled
2021-02-12 04330, 2021
ruaok
probably overkill for now. but as alastairp says, for the next round, yes.
2021-02-12 04302, 2021
atj
alastairp: I'd advice "set -u" too as good practice, but it can be confusing
2021-02-12 04307, 2021
atj
*advise
2021-02-12 04316, 2021
alastairp
yeah, looks good too
2021-02-12 04327, 2021
atj
However, use of unset variables in shell scripts can lead to bad consequences (trust me I know) :)
2021-02-12 04341, 2021
alastairp
rm -rf $THEDIR/
2021-02-12 04350, 2021
atj
Yes, indeed!
2021-02-12 04356, 2021
atj shudders
2021-02-12 04312, 2021
alastairp
atj: you're always full of good ideas, thanks for jumping in from time to time!
2021-02-12 04342, 2021
atj
alastairp: thanks, I'm hesitant to jump in often as I have so little knowledge of your setup
2021-02-12 04318, 2021
atj
glad it's helpful
2021-02-12 04329, 2021
ruaok
indeed, thanks atj!
2021-02-12 04305, 2021
atj
I'd like to get a bit more involved at some point if possible, as it seems like I could be vaguely useful :)
2021-02-12 04325, 2021
ruaok
you already are!
2021-02-12 04335, 2021
ruaok
but which project interests you?
2021-02-12 04347, 2021
atj
Probably MB interests me the most but also LB.
2021-02-12 04320, 2021
atj
I'm quite interested in the general devops/sysadmin aspect too.
2021-02-12 04304, 2021
ruaok
I'm sure zas could use some help. :)
2021-02-12 04307, 2021
atj
I do a mix of sysadmin/devops and coding at work. My Perl is OK and but Python is quite rusty.