I note that it doesn't output anything anyway - is this the logging update that you want to do?
because we should definitely have a log message "started" and "finished", with date
then we can redirect output to a log file
_lucifer
actually, i checked again it does finish. the issue is that it doesn't log errors correctly.
`self.log` is wrong should be logger or logging
i'll add a start log message as well
*it does log on finishing
alastairp
right. we should make sure we log on start too, then
zas
switching back to kiki
alastairp
and maybe we need a "logging cookbook" too, some basic guide for how to ensure that we always get logging correct, it seems to be causing us all sorts of problems recently
_lucifer
yup that would be useful
alastairp
I think that things that run within a flask context should be mostly fine (using current_app.logger)
just to confirm - this is a command that runs in the context of the webserver (e.g. in a web/cron container) ?
_lucifer
and a few other places in spark reader, i do not see why we need a flask context to connect to rabbitmq.
alastairp
I guess in this case it's so that we can use current_app
_lucifer
yes this command runs from cron, there are a few other that run in spark reader same image different container
alastairp
but this is an interesting example -
why do we need to make a connection to rabbitmq? does create_app connect to rabbitmq, or is this something that's only done in request_manage?
what I'm trying to say is - by doing things within an app context (especially in manage.py), we should have all of our connections to dependent services running, redis, db, timescale, logging, etc
which I think is really useful
_lucifer
we just use current_app to get the config. the connection is done by calling utils.connect_to_rabbitmq
alastairp
so this example in request_manage.py - by using an app context we don't need to configure a logger. it's all done in create_app(). If we decided to remove this, then we'd have 2 different ways of doing logging within the webserver
_lucifer
i see we have 8 services, 4 of those use flask other 4 don't (this is excluding mbid mapping writer, spark cluster stuff)
alastairp
can we make a list of those in a document somewhere?
_lucifer
sure, i'll do that
alastairp
my gut feel is that we should use flask in all of the services which run from the web image
_lucifer
so we definitely have enough services to justify 2 setups. using flask context is easy but it does not look ideal to me.
which means that for example, the call to `utils.connect_to_rabbitmq` in request_manage.send_request_to_spark_cluster isn't actually needed
aggregates updates finished
_lucifer
oh interesting!
your point is valid, as long as its the web image flask is going to be there whether we use it or not.
alastairp
right - so then we have the situation of spark; or the mbid mapping writer
_lucifer
let me check what image the mbid mapping writer uses
alastairp
we decided that in some cases (mbid writer) it doesn't make sense to carry around the whole webserver image
ruaok and I had a discussion about this a few times in the last months. we're happy with this decision at the moment
_lucifer
cool, so we can setup a normal logger there without flask.
alastairp
one thing we discussed was if we could have a standardised setup for logging - perhaps we _do_ want to put logging setup in BU. We could consider using optional dependencies in BU for the webserver dependencies
_lucifer
we had discussed making flask optional in BU using extra_requires. we can do that and then add some common loggin stuff thre
that's interesting. maybe we should take a step back from this and see if it's the correct way to do it
flamingspinach has quit
_lucifer
yeah we probably should
flamingspinach joined the channel
atj
When I play tracks from a VA release, the artist name in LB displays as "Various Artists" but links to the actual artists MB page. Is this a client issue? MBIDs are being submitted with listens.
alastairp
we don't use MBIDs in the LB interface yet.
well, maybe we do - if the link is there
atj
It links to the correct MBID so you must do? :)
alastairp
it's possible that your client is submitting the album artist instead of the track artist
looks like its been a terrible week for actions as well, down 3rd time since friday
alastairp
looks like I picked the wrong week to give up jenkins
_lucifer
but then they decided to build it all in node so they were also asking for it ;)
atj
Maybe their Azure billing account overflowed :)
_lucifer
alastairp: the logger here https://github.com/metabrainz/listenbrainz-serv... is not configured to output time etc. should i add the configuration in the listenbrainz/__init__.py, the same what we added in listenbrainz_spark/__init__.py.
excluding the file handler)
alastairp
I guess that'd be OK, but why are we using a specific logger here instead of current_app.logger?
_lucifer
not sure
alastairp
atj: off the top of you head, is there a difference in the order of `2>&1` and `> file` when doing redirects?
I've always done 2>&1 first, because I read it as "redirect stderr to stdout then redirect stdout to file", but does it work if they're around the other way too?
Daaamn, that was fast. I wasn't here 2 minutes before I was deposited into the 15 minute waiting area.
alastairp
atj: oh interesting, so I've always done it wrong!
thanks
yvanzo
yyoung: congratulations! :)
atj
alastairp: this is why in general I would suggest you use "&>file" if you know you'll always be using bash
alastairp
does that do both?
this is in a crontab, I don't know what vixiecron uses
ruaok
The vaccine center is totally not busy. If you two just randomly showed up, i bet they would vaccinate you, Mr_Monkey & alastairp
atj
alastairp: yes it does redirect both, you can set "SHELL=/bin/bash" in the crontab file to force the shell that cron will use to execute (normally it's "/bin/sh")
"Several environment variables are set up automatically by the cron(8) daemon. SHELL is set to /bin/sh, ..."
Mineo
shellcheck (https://www.shellcheck.net/) will flag the incorrect order of redirection magic (I can never remember the correct one, either :-) )
alastairp
atj: oh yeah, of course. thanks
Mineo: yeah, we use shellcheck more often now, which is really useful
_lucifer: mmm, I was thinking about cron validation - we could have a job that runs when the crontab updates to check that everything is present. but I just realised that we could also consider doing shellcheck too
(as long as actions doesn't keep disappearing :)
_lucifer
with conditional file path checking, i would say let's make shellcheck run in an action if any script is running.
*is modified
alastairp
yep
_lucifer
does shellcheck accept crontab though?
alastairp
we have plenty other shell scripts though, maybe it doesn't help in this specific case
_lucifer
yes, right. we should definitely add shellcheck.
i was thinking in addition to it what is possible for the cron use case.
[listenbrainz-server] alastair merged pull request #1439 (master…spark-test-improvements): Some more improvements to Spark test and development setup https://github.com/metabrainz/listenbrainz-serv...
_lucifer
alastairp: 02 and 0 are same in crontab right?
02 and 2
atj
_lucifer: yes
_lucifer
👍, thanks!
alastairp: another thing `/logs` is `chown`'ed to `lbdumps` user. I was intending to redirect stats log to `/logs/stats.log` but i don't think it would in that case. but now that we have only one crontab do we even need two users?
ruaok
Setting up a docker registry doesn't look hard...
We could set up a trial one and see how much disk it needs.
_lucifer
setting up github's registry should be easier i believe?
ruaok
It's still in beta.
shivam-kapila
> The vaccine center is totally not busy. If you two just randomly showed up, i bet they would vaccinate you, Mr_Monkey & alastairp
No age bar in spain?
ruaok
And I feel that we rely on GH too much as is...
shivam-kapila: yes, there is.
But the people there looked bored. Lol.
shivam-kapila
Haha. I got an appointment but its quite far away :/
There were around 400 slots nearby but filled in say 20sec even with a captcha
_lucifer
our LB images are 1.25G each so that's a starting point to consider regarding diskspace although some layers might get reused.
alastairp
_lucifer: agreed, I think there's no reason to have 2 different users
_lucifer
out of curiosity, why did we have two until now because you mentioned we use global crontabs not user specific?
we indeed rely on Github a lot but we had also discussed some time ago that if we can avoid setting up our own services that would be nice.
ruaok
True that as well.
Maybe we should trial GH registry....
alastairp
_lucifer: ah, good point. so there's a bit of history here
at some point in time, crontabs _were_ installed as specific users
but there was a bit of unneeded complexity, so I removed it. I suspect you're right that this is why we originally had 2 users: 2 files -> 2 users
although, here's an interesting question. we create lbdumps with a known uid so that it can write to the storage box. maybe instead of of creating a `listenbrainz` user and then `lbdumps` with a specific UID, we could just set `listenbrainz` to that specific uid instead. I'll have to verify the behaviour of this
_lucifer
alastairp, oh interesting. makes sense. i'll just change all to lbdumps for now. we can later change it all to `listenbrainz` user if it works fine.
alastairp
sounds good to me
actually - once we upgrade lemmy and remove the storagebox this requirement for a specific uid becomes less important anyway, so we might be able to just remove it