I was thinking to work on BU-6. Do you any suggestions/pointers for it? For now, I am extracting the common parts of the code to see how much overlap is there.
_lucifer: I'm not sure that ticket is really high priority enough for the effort it'll need.
2021-02-08 03928, 2021
iliekcomputers
all three of our projects have data dumps already, not sure refactoring that code into BU is the best use of your time for now.
2021-02-08 03953, 2021
reosarevok
yvanzo, bitmap: not sure what to do with that one. We *do* calculate different durations based on whether tracks or discID is loaded, for the same release
2021-02-08 03906, 2021
reosarevok
But which one is "right"? :D
2021-02-08 03916, 2021
reosarevok
At first I thought dropping the pregap would be the same
2021-02-08 03926, 2021
reosarevok
eh. s/same/best/
2021-02-08 03917, 2021
reosarevok
But then, what to do when a disc has an actual pregap track? We don't know that, because we don't have the tracks loaded
2021-02-08 03943, 2021
reosarevok
Should we just "remove two seconds per disc if the pregap is longer than two seconds"?
2021-02-08 03946, 2021
_lucifer
iliekcomputers: yeah, makes sense
2021-02-08 03900, 2021
reosarevok
(alternatively, we could have a length view like we have for first_recording_date, that'd probably also help with MBS-11268)
_lucifer: hmm, for me the key functionality would be 1) be able to easily define the dump structure for "normal" tables (e.g. user lists), 2) be able to dump and restore them, 3) and have tests for the dump code
2021-02-08 03926, 2021
alastairp
I think that things like lb listens or ab data is out of scope for this, but everything else probably fits within the scope of it
2021-02-08 03910, 2021
Nyanko-sensei has quit
2021-02-08 03944, 2021
Nyanko-sensei joined the channel
2021-02-08 03938, 2021
BrainzGit
[bookbrainz-site] MonkeyDo closed pull request #552 (master…snyk-upgrade-f744969d2d654e17931b5b4d9772de43): [Snyk] Upgrade swagger-jsdoc from 4.0.0 to 4.3.2 https://github.com/bookbrainz/bookbrainz-site/pul…
2021-02-08 03940, 2021
_lucifer
alastairp: agreed. also, i tested the BU PR for version change and its works fine for AB, CB, LB (one the dataset hoster PR is merged and released)
2021-02-08 03951, 2021
alastairp
_lucifer: yeah, I saw that thanks
2021-02-08 03911, 2021
alastairp
did you see the comment that I left in that PR, after I discussed with iliekcomputers
2021-02-08 03932, 2021
alastairp
we decided to pin specific versions of all of the dependencies in the downstream project
[bookbrainz-site] MonkeyDo merged pull request #503 (master…snyk-upgrade-ddfadf045c35fe0947282d8dccfdd5c4): [Snyk] Upgrade morgan from 1.9.1 to 1.10.0 https://github.com/bookbrainz/bookbrainz-site/pul…
the LB dumps, stats generation and rec recommendation stuff has some bugs, but the current observability emails provide very little insight beyond "cron did something, you should go check" and that gets filtered and ignore in my email.
2021-02-08 03927, 2021
ruaok
I want to have a place where I can see how things ran. did a job complete or was it interrupted. which users had data generated, which ones not.
2021-02-08 03939, 2021
ruaok
email is certainly not the way to do this. telegraf isn't either.
2021-02-08 03949, 2021
ruaok
I am wonder if using sentry would be an abuse of this.
2021-02-08 03955, 2021
alastairp
currently they run through cron? and the only report is the email sent by printing to stdout in cron?
2021-02-08 03906, 2021
ruaok
perhaps a different sentry project? tagged entries? not sure.
2021-02-08 03925, 2021
ruaok
not even stdout...
2021-02-08 03929, 2021
alastairp
kind of sounds like you're asking for unified logging
ah, I see. that's an explicit email. I didn't know that existed
2021-02-08 03930, 2021
alastairp
yes, right
2021-02-08 03945, 2021
ruaok
its not well advertised since its... not very useful.
2021-02-08 03945, 2021
alastairp
I'm trying to recall if I know of something similar that doesn't require us to reinvent the wheel
2021-02-08 03909, 2021
ruaok
one thought was to create status tables somewhere and then slap a dataset hoster in front.
2021-02-08 03918, 2021
ruaok
also very odd.
2021-02-08 03937, 2021
alastairp
so we want to know at least: when something last ran, if it ran successfully, any errors or log messages made during the time
2021-02-08 03903, 2021
ruaok
minimally yes.
2021-02-08 03905, 2021
alastairp
I think we've kind of shown that sentry isn't a great fit for logging here
2021-02-08 03913, 2021
alastairp
given what we started to remove
2021-02-08 03939, 2021
alastairp
when you say that these services have some bugs, what do you mean? Bugs that cause the process to crash? or more subtle ones?
2021-02-08 03945, 2021
ruaok
the checking how data was generated per user might be done elsewhere. in particular we need to show "this data was last updated on X" timestamps near where the data is displayed. right now there is little visiblity around this.
2021-02-08 03927, 2021
ruaok
sometimes dumps get interrupted and I dont know why. consul updates, perhaps. that causes dumps to fail and empty files to be dumped to the FTP site.
2021-02-08 03947, 2021
ruaok
those are the most crtiical bugs to fix, but thats hard to do without decent logging to see what is going on.
2021-02-08 03925, 2021
alastairp
yes, right
2021-02-08 03931, 2021
ruaok
then users get deleted in PG, but not in spark so the calculated data flow borks since the dest is gone.
2021-02-08 03950, 2021
alastairp
this ties in a bit with what we discussed last time in person - logging for when services are restarted. but this isn't the whole picture
2021-02-08 03955, 2021
Etua joined the channel
2021-02-08 03902, 2021
ruaok
this is fixed with the next full data dump. this is not a real problem, but just one of the many things odd about this.
2021-02-08 03916, 2021
ruaok
yes.
2021-02-08 03929, 2021
ruaok
what I want doesn't fit well into any of our existing monitoring systems.
2021-02-08 03954, 2021
alastairp
I like the idea of a dashboard for this, but I'm unsure where the limits of that should be
2021-02-08 03932, 2021
ruaok
one possible scenario is really clean up what gets generated from teh containers and have it to go a log file that can acutually be served by lb-web so that anyone can follow along.
2021-02-08 03939, 2021
alastairp
an existing python package that replaces cron? do we do it all ourselves? (more maintenance, ugh), do we make it generic for use in other projects?
2021-02-08 03904, 2021
ruaok
all good questions. I don't want to invent more stuff.
2021-02-08 03915, 2021
alastairp
yes, that's why I asked about cron emails - cron will email stdout to MAIL
2021-02-08 03940, 2021
alastairp
but we could just as easily configure logging to go to stdout + a file, and have a basic interface over a set of directories
2021-02-08 03949, 2021
ruaok
some parts of what I want a temporary -- debugging stuff. that will go away. and hopefully all this will go away and become more transparent for the user over time. so I am hesitant to spend a lot of time on this.
2021-02-08 03906, 2021
ruaok
alastairp: agreed, I think that might be a good short term solution.
2021-02-08 03946, 2021
alastairp
the specific case of users getting deleted in pg but not in spark sounds like a tangential bug. That is, there's a problem here, but it's not core to our question of how to log stuff
2021-02-08 03929, 2021
alastairp
I made something similar for an MTG project many years ago, which captured logging messages and put them in redis, then had a dashboard to show logs for each job
2021-02-08 03935, 2021
alastairp
I don't think it even works any more
2021-02-08 03935, 2021
ruaok
its not even a problem per se. a temporal blip -- later we may want to check to ensure that people exist in PG before generating.
2021-02-08 03947, 2021
iliekcomputers
i'm interested in this conversation as well, but in a meeting, i'll read the backlog.
2021-02-08 03952, 2021
ruaok
iliekcomputers: +1
2021-02-08 03908, 2021
ruaok
alastairp: this convo is helping me sort out my thoughts.
2021-02-08 03933, 2021
ruaok
first steps: have cron log to docker, improve log messages and stop emails.
2021-02-08 03905, 2021
ruaok
with emphasis on cron jobs starting/stopping and cleaning up correctly if things go badly.
2021-02-08 03918, 2021
ruaok
I think that will shape future thinking on how to do this better.
2021-02-08 03932, 2021
alastairp
that sounds like a good first step
2021-02-08 03958, 2021
ruaok
lets keep thinking about this issue and see what we can come up with.
2021-02-08 03944, 2021
alastairp
cron jobs and web both run on lemmy. this means we can share a volume between them and show logs on an admin
2021-02-08 03908, 2021
ruaok
that would be a pretty decent start, yes.
2021-02-08 03958, 2021
alastairp
ruaok: we said that we'd talk about caching today too
2021-02-08 03941, 2021
ruaok
that also means we'd need to talk about cache invalidation too. 🤬
2021-02-08 03924, 2021
ruaok
but, I can do that now...
2021-02-08 03924, 2021
alastairp
I had the following topics on my list: 1) remove cache key hashing (_lucifer has done this), 2) look in to seeing if we need to improve redis configs to make writing to disk more efficient, 3) enforce expiry times in BU? and update all downstream apps, 4) make a decision on BU-25
key hashing means that we have better keyspace visibility, yes?
2021-02-08 03912, 2021
ruaok
I think enforcing expiry times -- how about setting a faily default expiry time? 10mins or so?
2021-02-08 03912, 2021
alastairp
exactly. so we can just look at the keys and work out what is in there
2021-02-08 03922, 2021
ruaok
+100 to that.
2021-02-08 03934, 2021
alastairp
the only reason it existed was that memcache had a max key name limit. we're no way near this limit in redis
2021-02-08 03910, 2021
alastairp
honestly, I don't think it's a problem to make the expiry time required
2021-02-08 03915, 2021
ruaok
ok, #2 is what exactly?
2021-02-08 03929, 2021
alastairp
it makes people think in more detail about what they're actually caching
2021-02-08 03935, 2021
ruaok
> honestly, I don't think it's a problem to make the expiry time required
2021-02-08 03949, 2021
ruaok
I wonder how much work that will be in the short term. but yes, making people think about it is good.
2021-02-08 03908, 2021
alastairp
for #2, I was reading through redis configuration docs a few weeks ago, and there are a few different ways of telling it to write to disk
2021-02-08 03953, 2021
alastairp
it seems that there's a "replay" version, where it just writes all commands sequentially, but there's also a version that looks at all of the commands and simplifies it (e.g. if you add a key then delete it, it no longer needs to be in the replay log)
2021-02-08 03909, 2021
ruaok
that actually begs the question, should redis be writing to disk in normal operations?
2021-02-08 03914, 2021
alastairp
it came to mind after we were looking at the amount of disk IO that CB redis was doing