in #metabrainz

9:42 AM
_lucifer

👍
9:42 AM
alastairp

I note that it doesn't output anything anyway - is this the logging update that you want to do?
9:43 AM
because we should definitely have a log message "started" and "finished", with date
9:43 AM
then we can redirect output to a log file
9:43 AM
_lucifer

actually, i checked again it does finish. the issue is that it doesn't log errors correctly.
9:43 AM
`self.log` is wrong should be logger or logging
9:43 AM
i'll add a start log message as well
9:44 AM
*it does log on finishing
9:47 AM
alastairp

right. we should make sure we log on start too, then
9:48 AM
zas

switching back to kiki
9:48 AM
alastairp

and maybe we need a "logging cookbook" too, some basic guide for how to ensure that we always get logging correct, it seems to be causing us all sorts of problems recently
9:48 AM
_lucifer

yup that would be useful
9:53 AM
alastairp

I think that things that run within a flask context should be mostly fine (using current_app.logger)
9:57 AM
BrainzGit

[listenbrainz-server] alastair opened pull request #1469 (master…cron-cleanup): Add missing username to dumps crontab https://github.com/metabrainz/listenbrainz-serv...
9:57 AM
_lucifer

yes, right but there are many places where we create the flask context just for logging purposes.
9:57 AM
what should we do for those cases.
9:57 AM
alastairp

can you give an example?
9:58 AM
BrainzGit

[listenbrainz-server] alastair closed pull request #1462 (master…dependabot/pip/pydantic-1.8.2): Bump pydantic from 1.8.1 to 1.8.2 https://github.com/metabrainz/listenbrainz-serv...
9:58 AM
[listenbrainz-server] alastair merged pull request #1465 (master…webpack-no-progress): Don't output any progress while compiling prod js files https://github.com/metabrainz/listenbrainz-serv...
9:59 AM
_lucifer

https://github.com/metabrainz/listenbrainz-serv...
9:59 AM
alastairp

just to confirm - this is a command that runs in the context of the webserver (e.g. in a web/cron container) ?
10:00 AM
_lucifer

and a few other places in spark reader, i do not see why we need a flask context to connect to rabbitmq.
10:00 AM
alastairp

I guess in this case it's so that we can use current_app
10:00 AM
_lucifer

yes this command runs from cron, there are a few other that run in spark reader same image different container
10:00 AM
alastairp

but this is an interesting example -
10:01 AM
why do we need to make a connection to rabbitmq? does create_app connect to rabbitmq, or is this something that's only done in request_manage?
10:01 AM
what I'm trying to say is - by doing things within an app context (especially in manage.py), we should have all of our connections to dependent services running, redis, db, timescale, logging, etc
10:01 AM
which I think is really useful
10:01 AM
_lucifer

we just use current_app to get the config. the connection is done by calling utils.connect_to_rabbitmq
10:03 AM
alastairp

so this example in request_manage.py - by using an app context we don't need to configure a logger. it's all done in create_app(). If we decided to remove this, then we'd have 2 different ways of doing logging within the webserver
10:04 AM
_lucifer

i see we have 8 services, 4 of those use flask other 4 don't (this is excluding mbid mapping writer, spark cluster stuff)
10:05 AM
alastairp

can we make a list of those in a document somewhere?
10:05 AM
_lucifer

sure, i'll do that
10:06 AM
alastairp

my gut feel is that we should use flask in all of the services which run from the web image
10:06 AM
_lucifer

so we definitely have enough services to justify 2 setups. using flask context is easy but it does not look ideal to me.
10:09 AM
alastairp

I just had a dig through the code, it seems that create_app() eventually calls https://github.com/metabrainz/listenbrainz-serv...
10:09 AM
which means that for example, the call to `utils.connect_to_rabbitmq` in request_manage.send_request_to_spark_cluster isn't actually needed
10:15 AM
aggregates updates finished
10:33 AM
_lucifer

oh interesting!
10:34 AM
your point is valid, as long as its the web image flask is going to be there whether we use it or not.
10:35 AM
alastairp

right - so then we have the situation of spark; or the mbid mapping writer
10:35 AM
_lucifer

let me check what image the mbid mapping writer uses
10:35 AM
alastairp

we decided that in some cases (mbid writer) it doesn't make sense to carry around the whole webserver image
10:36 AM
ruaok and I had a discussion about this a few times in the last months. we're happy with this decision at the moment
10:36 AM
_lucifer

cool, so we can setup a normal logger there without flask.
10:37 AM
alastairp

one thing we discussed was if we could have a standardised setup for logging - perhaps we _do_ want to put logging setup in BU. We could consider using optional dependencies in BU for the webserver dependencies
10:37 AM
_lucifer

we had discussed making flask optional in BU using extra_requires. we can do that and then add some common loggin stuff thre
10:37 AM
alastairp

snap
10:37 AM
_lucifer

yes exactly
10:37 AM
alastairp

cool, that's a future task, then
10:37 AM
this ties into my thoughts on LB-879
10:37 AM
BrainzBot

LB-879: unify external service connections in single location https://tickets.metabrainz.org/browse/LB-879
10:38 AM
_lucifer

nice
10:38 AM
in other news, caching now entirely fails in my latest experiments 😢
10:38 AM
alastairp

docker build cache?
10:38 AM
_lucifer

yup
10:39 AM
https://github.com/amCap1712/listenbrainz-serve...
10:39 AM
alastairp

that's interesting. maybe we should take a step back from this and see if it's the correct way to do it
10:39 AM
flamingspinach has quit
10:39 AM
_lucifer

yeah we probably should
10:40 AM
flamingspinach joined the channel
10:43 AM
atj

When I play tracks from a VA release, the artist name in LB displays as "Various Artists" but links to the actual artists MB page. Is this a client issue? MBIDs are being submitted with listens.
10:43 AM
alastairp

we don't use MBIDs in the LB interface yet.
10:44 AM
well, maybe we do - if the link is there
10:44 AM
atj

It links to the correct MBID so you must do? :)
10:44 AM
alastairp

it's possible that your client is submitting the album artist instead of the track artist
10:45 AM
atj

Yeah I thought that might be the issue.
10:47 AM
This gives some weird results in the UI: https://listenbrainz.org/user/atj/reports?range...
10:47 AM
multiple instances of the same artist name but they link to a different MBID
10:50 AM
Seems like my client isn't detecting compilations properly: https://gitlab.gnome.org/World/lollypop/-/blob/...
10:53 AM
_lucifer

looks like its been a terrible week for actions as well, down 3rd time since friday
10:54 AM
alastairp

looks like I picked the wrong week to give up jenkins
10:54 AM
_lucifer

but then they decided to build it all in node so they were also asking for it ;)
10:55 AM
atj

Maybe their Azure billing account overflowed :)
10:59 AM
_lucifer

alastairp: the logger here https://github.com/metabrainz/listenbrainz-serv... is not configured to output time etc. should i add the configuration in the listenbrainz/__init__.py, the same what we added in listenbrainz_spark/__init__.py.
10:59 AM
excluding the file handler)
11:00 AM
alastairp

I guess that'd be OK, but why are we using a specific logger here instead of current_app.logger?
11:00 AM
_lucifer

not sure
11:06 AM
alastairp

atj: off the top of you head, is there a difference in the order of `2>&1` and `> file` when doing redirects?
11:06 AM
I've always done 2>&1 first, because I read it as "redirect stderr to stdout then redirect stdout to file", but does it work if they're around the other way too?
11:11 AM
yvanzo

Freso: yes
11:11 AM
atj

alastairp: yes there is a difference
11:11 AM
it always confuses me as I forgot the order
11:12 AM
https://gist.github.com/atj/3228cea251168a0fa93...
11:13 AM
the correct order is ">file 2>&1"
11:13 AM
hope the gist makes sense
11:14 AM
ruaok

Daaamn, that was fast. I wasn't here 2 minutes before I was deposited into the 15 minute waiting area.
11:14 AM
alastairp

atj: oh interesting, so I've always done it wrong!
11:14 AM
thanks
11:14 AM
yvanzo

yyoung: congratulations! :)
11:16 AM
atj

alastairp: this is why in general I would suggest you use "&>file" if you know you'll always be using bash
11:18 AM
alastairp

does that do both?
11:18 AM
this is in a crontab, I don't know what vixiecron uses
11:20 AM
ruaok

The vaccine center is totally not busy. If you two just randomly showed up, i bet they would vaccinate you, Mr_Monkey & alastairp
11:23 AM
atj

alastairp: yes it does redirect both, you can set "SHELL=/bin/bash" in the crontab file to force the shell that cron will use to execute (normally it's "/bin/sh")
11:23 AM
https://www.man7.org/linux/man-pages/man5/cront...
11:24 AM
"Several environment variables are set up automatically by the cron(8) daemon. SHELL is set to /bin/sh, ..."
11:29 AM
Mineo

shellcheck (https://www.shellcheck.net/) will flag the incorrect order of redirection magic (I can never remember the correct one, either :-) )
11:30 AM
alastairp

atj: oh yeah, of course. thanks
11:30 AM
Mineo: yeah, we use shellcheck more often now, which is really useful
11:31 AM
_lucifer: mmm, I was thinking about cron validation - we could have a job that runs when the crontab updates to check that everything is present. but I just realised that we could also consider doing shellcheck too
11:31 AM
(as long as actions doesn't keep disappearing :)
11:31 AM
_lucifer

with conditional file path checking, i would say let's make shellcheck run in an action if any script is running.
11:31 AM
*is modified
11:31 AM
alastairp

yep
11:32 AM
_lucifer

does shellcheck accept crontab though?
11:32 AM
alastairp

we have plenty other shell scripts though, maybe it doesn't help in this specific case
11:33 AM
_lucifer

yes, right. we should definitely add shellcheck.
11:33 AM
i was thinking in addition to it what is possible for the cron use case.
11:33 AM
BrainzGit

[listenbrainz-server] alastair merged pull request #1469 (master…cron-cleanup): Add missing username to dumps crontab https://github.com/metabrainz/listenbrainz-serv...
11:36 AM
[listenbrainz-server] alastair merged pull request #1461 (master…dependabot/pip/master/pydantic-1.8.2): [Security] Bump pydantic from 1.8.1 to 1.8.2 https://github.com/metabrainz/listenbrainz-serv...
11:37 AM
[listenbrainz-server] alastair merged pull request #1459 (master…conditional-action-run): LB-895: Conditionally run tests based on paths modified https://github.com/metabrainz/listenbrainz-serv...
11:38 AM
[listenbrainz-server] alastair merged pull request #1439 (master…spark-test-improvements): Some more improvements to Spark test and development setup https://github.com/metabrainz/listenbrainz-serv...
12:02 PM
_lucifer

alastairp: 02 and 0 are same in crontab right?
12:03 PM
02 and 2
12:03 PM
atj

_lucifer: yes
12:04 PM
_lucifer

👍, thanks!
12:10 PM
alastairp: another thing `/logs` is `chown`'ed to `lbdumps` user. I was intending to redirect stats log to `/logs/stats.log` but i don't think it would in that case. but now that we have only one crontab do we even need two users?
12:10 PM
ruaok

Setting up a docker registry doesn't look hard...
12:11 PM
We could set up a trial one and see how much disk it needs.
12:11 PM
_lucifer

setting up github's registry should be easier i believe?
12:11 PM
ruaok

It's still in beta.
12:11 PM
shivam-kapila

> The vaccine center is totally not busy. If you two just randomly showed up, i bet they would vaccinate you, Mr_Monkey & alastairp
12:11 PM
No age bar in spain?
12:12 PM
ruaok

And I feel that we rely on GH too much as is...
12:12 PM
shivam-kapila: yes, there is.
12:12 PM
But the people there looked bored. Lol.
12:14 PM
shivam-kapila

Haha. I got an appointment but its quite far away :/
12:15 PM
There were around 400 slots nearby but filled in say 20sec even with a captcha
12:16 PM
_lucifer

our LB images are 1.25G each so that's a starting point to consider regarding diskspace although some layers might get reused.
12:19 PM
alastairp

_lucifer: agreed, I think there's no reason to have 2 different users
12:20 PM
_lucifer

out of curiosity, why did we have two until now because you mentioned we use global crontabs not user specific?
12:23 PM
ruaok: https://www.irccloud.com/pastebin/3uQJGiRL/ alastairp had shared this sometime ago. just docker-compose this to bring up a registry.
12:25 PM
we indeed rely on Github a lot but we had also discussed some time ago that if we can avoid setting up our own services that would be nice.
12:25 PM
ruaok

True that as well.
12:26 PM
Maybe we should trial GH registry....
12:38 PM
alastairp

_lucifer: ah, good point. so there's a bit of history here
12:39 PM
at some point in time, crontabs _were_ installed as specific users
12:39 PM
but there was a bit of unneeded complexity, so I removed it. I suspect you're right that this is why we originally had 2 users: 2 files -> 2 users
12:41 PM
although, here's an interesting question. we create lbdumps with a known uid so that it can write to the storage box. maybe instead of of creating a `listenbrainz` user and then `lbdumps` with a specific UID, we could just set `listenbrainz` to that specific uid instead. I'll have to verify the behaviour of this
12:45 PM
_lucifer

alastairp, oh interesting. makes sense. i'll just change all to lbdumps for now. we can later change it all to `listenbrainz` user if it works fine.
12:45 PM
alastairp

sounds good to me
12:46 PM
actually - once we upgrade lemmy and remove the storagebox this requirement for a specific uid becomes less important anyway, so we might be able to just remove it
12:56 PM
sumedh has quit
13:13 PM
_lucifer

https://sentry.metabrainz.org/metabrainz/listen...
13:14 PM
uh oh. this seems to be a problem. the disconnect logic is probably not handling some case correctly.
13:19 PM
alastairp

or someone double-clicked and it sent 2 requests
13:23 PM
_lucifer

that could be possible but these are radio buttons that trigger requests `onchange` not `onclick`
13:23 PM
elomatreb[m] has quit
13:26 PM
but i am unable to reproduce either way, carefully crafted case or fast random clicking :/, will look into again if more such errors come up.