I note that it doesn't output anything anyway - is this the logging update that you want to do?
2021-05-18 13810, 2021
alastairp
because we should definitely have a log message "started" and "finished", with date
2021-05-18 13815, 2021
alastairp
then we can redirect output to a log file
2021-05-18 13834, 2021
_lucifer
actually, i checked again it does finish. the issue is that it doesn't log errors correctly.
2021-05-18 13845, 2021
_lucifer
`self.log` is wrong should be logger or logging
2021-05-18 13859, 2021
_lucifer
i'll add a start log message as well
2021-05-18 13825, 2021
_lucifer
*it does log on finishing
2021-05-18 13837, 2021
alastairp
right. we should make sure we log on start too, then
2021-05-18 13800, 2021
zas
switching back to kiki
2021-05-18 13818, 2021
alastairp
and maybe we need a "logging cookbook" too, some basic guide for how to ensure that we always get logging correct, it seems to be causing us all sorts of problems recently
2021-05-18 13851, 2021
_lucifer
yup that would be useful
2021-05-18 13809, 2021
alastairp
I think that things that run within a flask context should be mostly fine (using current_app.logger)
just to confirm - this is a command that runs in the context of the webserver (e.g. in a web/cron container) ?
2021-05-18 13811, 2021
_lucifer
and a few other places in spark reader, i do not see why we need a flask context to connect to rabbitmq.
2021-05-18 13831, 2021
alastairp
I guess in this case it's so that we can use current_app
2021-05-18 13836, 2021
_lucifer
yes this command runs from cron, there are a few other that run in spark reader same image different container
2021-05-18 13839, 2021
alastairp
but this is an interesting example -
2021-05-18 13805, 2021
alastairp
why do we need to make a connection to rabbitmq? does create_app connect to rabbitmq, or is this something that's only done in request_manage?
2021-05-18 13853, 2021
alastairp
what I'm trying to say is - by doing things within an app context (especially in manage.py), we should have all of our connections to dependent services running, redis, db, timescale, logging, etc
2021-05-18 13859, 2021
alastairp
which I think is really useful
2021-05-18 13859, 2021
_lucifer
we just use current_app to get the config. the connection is done by calling utils.connect_to_rabbitmq
2021-05-18 13818, 2021
alastairp
so this example in request_manage.py - by using an app context we don't need to configure a logger. it's all done in create_app(). If we decided to remove this, then we'd have 2 different ways of doing logging within the webserver
2021-05-18 13820, 2021
_lucifer
i see we have 8 services, 4 of those use flask other 4 don't (this is excluding mbid mapping writer, spark cluster stuff)
2021-05-18 13806, 2021
alastairp
can we make a list of those in a document somewhere?
2021-05-18 13819, 2021
_lucifer
sure, i'll do that
2021-05-18 13815, 2021
alastairp
my gut feel is that we should use flask in all of the services which run from the web image
2021-05-18 13816, 2021
_lucifer
so we definitely have enough services to justify 2 setups. using flask context is easy but it does not look ideal to me.
which means that for example, the call to `utils.connect_to_rabbitmq` in request_manage.send_request_to_spark_cluster isn't actually needed
2021-05-18 13852, 2021
alastairp
aggregates updates finished
2021-05-18 13839, 2021
_lucifer
oh interesting!
2021-05-18 13834, 2021
_lucifer
your point is valid, as long as its the web image flask is going to be there whether we use it or not.
2021-05-18 13802, 2021
alastairp
right - so then we have the situation of spark; or the mbid mapping writer
2021-05-18 13832, 2021
_lucifer
let me check what image the mbid mapping writer uses
2021-05-18 13840, 2021
alastairp
we decided that in some cases (mbid writer) it doesn't make sense to carry around the whole webserver image
2021-05-18 13800, 2021
alastairp
ruaok and I had a discussion about this a few times in the last months. we're happy with this decision at the moment
2021-05-18 13829, 2021
_lucifer
cool, so we can setup a normal logger there without flask.
2021-05-18 13804, 2021
alastairp
one thing we discussed was if we could have a standardised setup for logging - perhaps we _do_ want to put logging setup in BU. We could consider using optional dependencies in BU for the webserver dependencies
2021-05-18 13811, 2021
_lucifer
we had discussed making flask optional in BU using extra_requires. we can do that and then add some common loggin stuff thre
that's interesting. maybe we should take a step back from this and see if it's the correct way to do it
2021-05-18 13816, 2021
flamingspinach has quit
2021-05-18 13832, 2021
_lucifer
yeah we probably should
2021-05-18 13834, 2021
flamingspinach joined the channel
2021-05-18 13818, 2021
atj
When I play tracks from a VA release, the artist name in LB displays as "Various Artists" but links to the actual artists MB page. Is this a client issue? MBIDs are being submitted with listens.
2021-05-18 13853, 2021
alastairp
we don't use MBIDs in the LB interface yet.
2021-05-18 13805, 2021
alastairp
well, maybe we do - if the link is there
2021-05-18 13813, 2021
atj
It links to the correct MBID so you must do? :)
2021-05-18 13833, 2021
alastairp
it's possible that your client is submitting the album artist instead of the track artist
looks like its been a terrible week for actions as well, down 3rd time since friday
2021-05-18 13805, 2021
alastairp
looks like I picked the wrong week to give up jenkins
2021-05-18 13807, 2021
_lucifer
but then they decided to build it all in node so they were also asking for it ;)
2021-05-18 13858, 2021
atj
Maybe their Azure billing account overflowed :)
2021-05-18 13839, 2021
_lucifer
alastairp: the logger here https://github.com/metabrainz/listenbrainz-server… is not configured to output time etc. should i add the configuration in the listenbrainz/__init__.py, the same what we added in listenbrainz_spark/__init__.py.
2021-05-18 13859, 2021
_lucifer
excluding the file handler)
2021-05-18 13832, 2021
alastairp
I guess that'd be OK, but why are we using a specific logger here instead of current_app.logger?
2021-05-18 13856, 2021
_lucifer
not sure
2021-05-18 13800, 2021
alastairp
atj: off the top of you head, is there a difference in the order of `2>&1` and `> file` when doing redirects?
2021-05-18 13834, 2021
alastairp
I've always done 2>&1 first, because I read it as "redirect stderr to stdout then redirect stdout to file", but does it work if they're around the other way too?
Daaamn, that was fast. I wasn't here 2 minutes before I was deposited into the 15 minute waiting area.
2021-05-18 13825, 2021
alastairp
atj: oh interesting, so I've always done it wrong!
2021-05-18 13829, 2021
alastairp
thanks
2021-05-18 13853, 2021
yvanzo
yyoung: congratulations! :)
2021-05-18 13806, 2021
atj
alastairp: this is why in general I would suggest you use "&>file" if you know you'll always be using bash
2021-05-18 13830, 2021
alastairp
does that do both?
2021-05-18 13846, 2021
alastairp
this is in a crontab, I don't know what vixiecron uses
2021-05-18 13820, 2021
ruaok
The vaccine center is totally not busy. If you two just randomly showed up, i bet they would vaccinate you, Mr_Monkey & alastairp
2021-05-18 13816, 2021
atj
alastairp: yes it does redirect both, you can set "SHELL=/bin/bash" in the crontab file to force the shell that cron will use to execute (normally it's "/bin/sh")
"Several environment variables are set up automatically by the cron(8) daemon. SHELL is set to /bin/sh, ..."
2021-05-18 13802, 2021
Mineo
shellcheck (https://www.shellcheck.net/) will flag the incorrect order of redirection magic (I can never remember the correct one, either :-) )
2021-05-18 13820, 2021
alastairp
atj: oh yeah, of course. thanks
2021-05-18 13834, 2021
alastairp
Mineo: yeah, we use shellcheck more often now, which is really useful
2021-05-18 13806, 2021
alastairp
_lucifer: mmm, I was thinking about cron validation - we could have a job that runs when the crontab updates to check that everything is present. but I just realised that we could also consider doing shellcheck too
2021-05-18 13829, 2021
alastairp
(as long as actions doesn't keep disappearing :)
2021-05-18 13836, 2021
_lucifer
with conditional file path checking, i would say let's make shellcheck run in an action if any script is running.
2021-05-18 13845, 2021
_lucifer
*is modified
2021-05-18 13846, 2021
alastairp
yep
2021-05-18 13808, 2021
_lucifer
does shellcheck accept crontab though?
2021-05-18 13847, 2021
alastairp
we have plenty other shell scripts though, maybe it doesn't help in this specific case
2021-05-18 13824, 2021
_lucifer
yes, right. we should definitely add shellcheck.
2021-05-18 13843, 2021
_lucifer
i was thinking in addition to it what is possible for the cron use case.
[listenbrainz-server] alastair merged pull request #1439 (master…spark-test-improvements): Some more improvements to Spark test and development setup https://github.com/metabrainz/listenbrainz-server…
2021-05-18 13859, 2021
_lucifer
alastairp: 02 and 0 are same in crontab right?
2021-05-18 13803, 2021
_lucifer
02 and 2
2021-05-18 13853, 2021
atj
_lucifer: yes
2021-05-18 13809, 2021
_lucifer
👍, thanks!
2021-05-18 13818, 2021
_lucifer
alastairp: another thing `/logs` is `chown`'ed to `lbdumps` user. I was intending to redirect stats log to `/logs/stats.log` but i don't think it would in that case. but now that we have only one crontab do we even need two users?
2021-05-18 13842, 2021
ruaok
Setting up a docker registry doesn't look hard...
2021-05-18 13816, 2021
ruaok
We could set up a trial one and see how much disk it needs.
2021-05-18 13823, 2021
_lucifer
setting up github's registry should be easier i believe?
2021-05-18 13842, 2021
ruaok
It's still in beta.
2021-05-18 13848, 2021
shivam-kapila
> The vaccine center is totally not busy. If you two just randomly showed up, i bet they would vaccinate you, Mr_Monkey & alastairp
2021-05-18 13854, 2021
shivam-kapila
No age bar in spain?
2021-05-18 13810, 2021
ruaok
And I feel that we rely on GH too much as is...
2021-05-18 13833, 2021
ruaok
shivam-kapila: yes, there is.
2021-05-18 13855, 2021
ruaok
But the people there looked bored. Lol.
2021-05-18 13858, 2021
shivam-kapila
Haha. I got an appointment but its quite far away :/
2021-05-18 13830, 2021
shivam-kapila
There were around 400 slots nearby but filled in say 20sec even with a captcha
2021-05-18 13837, 2021
_lucifer
our LB images are 1.25G each so that's a starting point to consider regarding diskspace although some layers might get reused.
2021-05-18 13814, 2021
alastairp
_lucifer: agreed, I think there's no reason to have 2 different users
2021-05-18 13809, 2021
_lucifer
out of curiosity, why did we have two until now because you mentioned we use global crontabs not user specific?
we indeed rely on Github a lot but we had also discussed some time ago that if we can avoid setting up our own services that would be nice.
2021-05-18 13843, 2021
ruaok
True that as well.
2021-05-18 13802, 2021
ruaok
Maybe we should trial GH registry....
2021-05-18 13841, 2021
alastairp
_lucifer: ah, good point. so there's a bit of history here
2021-05-18 13814, 2021
alastairp
at some point in time, crontabs _were_ installed as specific users
2021-05-18 13841, 2021
alastairp
but there was a bit of unneeded complexity, so I removed it. I suspect you're right that this is why we originally had 2 users: 2 files -> 2 users
2021-05-18 13854, 2021
alastairp
although, here's an interesting question. we create lbdumps with a known uid so that it can write to the storage box. maybe instead of of creating a `listenbrainz` user and then `lbdumps` with a specific UID, we could just set `listenbrainz` to that specific uid instead. I'll have to verify the behaviour of this
2021-05-18 13815, 2021
_lucifer
alastairp, oh interesting. makes sense. i'll just change all to lbdumps for now. we can later change it all to `listenbrainz` user if it works fine.
2021-05-18 13825, 2021
alastairp
sounds good to me
2021-05-18 13837, 2021
alastairp
actually - once we upgrade lemmy and remove the storagebox this requirement for a specific uid becomes less important anyway, so we might be able to just remove it