in #metabrainz

15:44 PM
pite joined the channel
15:50 PM
monkey[m]

zas: Hello! Hope you are not too busy battling AI vermin, if you have a moment I have a question regarding a Grafana alert.... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
15:51 PM
d4rk-ph0enix has quit
15:51 PM
d4rk-ph0enix joined the channel
15:55 PM
zas[m]

Why doesn't it provide data sometimes? I mean, if the reason is the service doesn't run we expect an alert, right?
15:55 PM
monkey[m]

Transient issues that resolve themselves
15:56 PM
We have other alerts for non-responding website and API, so I'm not expecting these stats alerts to trigger when connection is temporarily lost.
15:56 PM
After a delay of 10 minutes, then yes it would make sense to trigger.
15:57 PM
zas[m]

There's already a 5m pending period for this alert, we can increase this
15:57 PM
monkey[m]

Ah, yes please!
15:57 PM
Is that a 5m delay on triggering the alert in any case, or just for loss of data (just out of curiosity)
15:57 PM
zas[m]

10m or more?
15:58 PM
jasje[m]

Any contributors who are cooking a GSoC proposal towards ListenBrainz Android project should take a look at ideas page again. Eased some things and added more context.
15:58 PM
zas[m]

Any alert for this alert rule
15:58 PM
We can't dissociate them
15:58 PM
monkey[m]

That's what I figured. Thanks for confirming
15:59 PM
To me this reeads as a 1m delay, not 5. Am I looking at the wrong thing?
15:59 PM
monkey[m] uploaded an image: (34KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/XKEAkQFpjiAZCDNZEtXQOqpW/image.png >
16:00 PM
zas[m]

That's the time the rule is evaluated, but the pending period is the time before it actually alerts (if the state is the same)
16:00 PM
zas[m] uploaded an image: (46KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/wQKnfFxSmudIESOxhainIIZZ/image.png >
16:01 PM
So basically it evaluates the state every minute, but wait for 5 minutes to see if it was transient or not
16:01 PM
It limits false alerts
16:01 PM
We can evaluate the state less often too
16:02 PM
but then the pending period has to be longer
16:02 PM
monkey[m]

I was lookign at the pending period, it seemed to me to be set to 1m when I opened the alert edit page
16:02 PM
zas[m]

If you look carefully Grafana shows alerts as activated as "Pending" when it happens, notifications are sent at the end of the pending period
16:03 PM
s/as//
16:03 PM
monkey[m]

Anyway, thanks for the assist. Let's see how it goes with 5m delay.
16:04 PM
zas[m]

Wait, which alert did you copied from? because I have 5m for stats
16:04 PM
monkey[m]

<zas[m]> "There's already a 5m pending..." <- This is the bit I was confused about. I can only see 1m pending period, nothing that says it was set to 5m
16:04 PM
zas[m]

https://stats.metabrainz.org/alerting/fe4fa6v0g...
16:04 PM
monkey[m]

I was on https://stats.metabrainz.org/alerting/debdj8a19...
16:04 PM
I see, those are thesitewide
16:05 PM
zas[m]

You can change pending period to 5m for it then
16:05 PM
monkey[m]

OK, and it wasn't configured with the same delay. Mystery solved :)
16:05 PM
I'll change that for the sitewide stats alerts
16:05 PM
Then I think it will work fine, the other (non-sitewide) alerts have been behaving better
16:05 PM
zas[m]

The pending period limits the number of notifications if states change too quickly$
16:06 PM
s/$/./
16:06 PM
monkey[m]

Yep, that's what I was looking for.
16:07 PM
Thank you!
16:07 PM
zas[m]

np :)
16:13 PM
reosarevok[m]

Weren't we supposed to meet around now? :)
16:16 PM
zas[m]

yup
16:18 PM
reosarevok[m]

mayhem, bitmap, julian45 @julian45:julian45.net
16:18 PM
mayhem[m] is here despite being on the phone the bank
16:20 PM
So, any updates or new info?
16:21 PM
julian45[m]

none from me that haven't already been discussed out-of-band
16:21 PM
zas[m]

We have to decide what to do, it was suggested to use Anubis, does it look actionable? Are there any objection?
16:21 PM
mayhem[m]

I wrote a proposal with my ideas, but didn't get a lot of feedback, only from julian45 who made a number of good arguments as to why its not a great idea.
16:22 PM
zas[m]

We had discussions about Cloudflare, and potential conflicts with our policies, this should be investigated too (in case we decide to move to such service, cloudflare or similar)
16:22 PM
mayhem[m]

I feel so so about anubis. its seems a bit heavy handed, so I wish we could find out more about what level of effort these people are willing to go through to keep scraping us.
16:22 PM
the recent cloudflare outages make me really not like that option much.
16:23 PM
julian45[m]

anubis looks actionable IMO as a first line/first attempt, esp since docs indicate policy is configurable to allow, e.g., legit scrapers like google while challenging others
16:23 PM
https://anubis.techaro.lol/docs/admin/policies
16:24 PM
reosarevok[m]

I think heavy handed is a good start given the situation
16:24 PM
mayhem[m]

if we implement anubis, can we have it only on pages that contain data that is being scraped?
16:24 PM
reosarevok[m]

(and we could lower the heavy-handiness if we get things under control in other ways)
16:24 PM
julian45[m]

i do worry that it could potentially be annoying for some users who, e.g., disable js by default in their browsers, but those kinds of folks should be willing to carve out exceptions
16:25 PM
mayhem[m]

e.g. style guides require no anubis
16:25 PM
zas[m]

mayhem: about your proposal, I think we should rely on existing tools first if possible (not reinventing the wheel), but I don't totally rule it out, because we might find limitations in third party tools we don't have with our own ones.
16:25 PM
reosarevok[m]

Those users could possibly get in touch with us and get exceptions added for them
16:25 PM
julian45[m]

mayhem[m]: this kind of goes back to the separation of API requests to a subdomain need that was discussed yesterday
16:25 PM
reosarevok[m]

(re: no-js people)
16:26 PM
julian45[m]

reosarevok[m]: if they aren't able to configure their clients to make exceptions for us, sure - chances are they would need js to use our site(s) anyway, no?
16:26 PM
mayhem[m]

julian45[m]: that seems conflated to me. we can partition the URL space for web pages without ever considering the API.
16:26 PM
julian45[m]

ah i see
16:27 PM
zas[m]

They (may) hit ANY page, but the fact is those with MB data have a much higher cost for us (they hit backends and db)
16:27 PM
reosarevok[m]

julian45: fair, we require JS for editing anyway, just not for reading - but the kind of people we'd possibly make exceptions for probably edit so
16:27 PM
julian45[m]

if needed, per the policy doc i linked, i think we can tell anubis to let certain path regexes through but not others
16:27 PM
mayhem[m]

julian45[m]: still, your point is valid.
16:27 PM
but probably not needed right this second.
16:28 PM
reosarevok[m]

Would something like anubis interfere with external seeding?
16:28 PM
(it's ok if it does temporarily given the circumstances, but making sure we don't need to make MBS changes to support it)
16:28 PM
julian45[m]

i doubt it, but then again external seeding is something we allow that many implementers (e.g., GNOME project gitlab) might not
16:29 PM
so i would suggest deploying on test or beta to figure that out before prod
16:29 PM
* i doubt it, but then again external seeding is something we have as part of our use cases that many implementers (e.g., GNOME project gitlab) might not
16:30 PM
reosarevok[m]

zas: is beta also being hit?
16:30 PM
zas[m]

Yes, it was first
16:30 PM
reosarevok[m]

(so, if we put this in front of beta first, would it actually teach us if it will help)
16:31 PM
Ok
16:31 PM
zas[m]

This is how I discovered the problem, beta containers were eating a lot of resources suddenly
16:31 PM
bitmap[m]

reosarevok[m]: it might interfere with release editor seeding since that requires POST data which can't be redirected
16:32 PM
reosarevok[m]

Hopefully those can be let through then since those should never match the kind of hits causing issues
16:33 PM
zas[m]

But seeding requires to be logged in, so we can just skip any check for those
16:33 PM
julian45[m]

zas[m]: it usually forces logout, then login though, right?
16:33 PM
reosarevok[m]

Ok, that's another thing I asked several times but I think never got an answer for: can we separate logged in from not logged in queries?
16:33 PM
For anubis
16:34 PM
And run it only on logged out for now
16:34 PM
mayhem[m]

I propose that we try anubis on mb.org's data pages and see what happens.
16:34 PM
I much prefer this option over cloudflare.
16:34 PM
who objects to this suggestion?
16:34 PM
(sorry was disconnected for a bit, back now)
16:34 PM
zas[m]

bitmap suggested an internal header set by backends for that, so we can get this info on gateways at least
16:34 PM
reosarevok[m]

I'd say on beta.mb.org for now, but I'd agree otherwise
16:34 PM
julian45[m]

reosarevok[m]: unfortunately not sure
16:35 PM
mayhem[m]

mayhem[m]: this may now be out of date. heh.
16:35 PM
julian45[m]

* not sure i.r.t. anubis
16:35 PM
reosarevok[m]

That'd also allow us to play with any changes we need to make things better on the MBS side
16:35 PM
Before we put them in prod
16:35 PM
bitmap[m]

julian45[m]: yeah, IIRC the login cookies are not available to the request MB receives
16:36 PM
but if it's only restricted to data pages then not an issue
16:36 PM
reosarevok[m]

Yeah, I guess if we can ignore /edit pages it seems good
16:36 PM
Well, /edit /add /create etc
16:37 PM
But we could easily come up with a list
16:37 PM
julian45[m]

reosarevok[m]: which seems doable but i would like someone to double check the doc page i linked to make sure i'm interpreting correctly
16:37 PM
reosarevok[m]: great, because policy for anubis is configured by json file anyway per docs
16:38 PM
reosarevok[m]

Well, https://anubis.techaro.lol/docs/admin/policies/ the docs have
16:38 PM
{... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
16:38 PM
I don't see why it would not work for other things than robots.txt :)
16:38 PM
julian45[m]

exactly
16:39 PM
just wanted to be sure i wasn't the only one reaching that conclusion from docs
16:41 PM
reosarevok[m]

Ok, this looks like it should work - so sysadmin team works on setting up anubis, mbs team figures out what paths to allow (could include all of /ws/2 as well for now AFAICT), we reconvene and let it loose on test first, then beta if nothing is horribly broken?
16:42 PM
bitmap[m]

"Anubis uses a multi-threaded proof of work check to ensure that users browsers are up to date and support modern standards." not sure what they mean by the last past (since we support older browsers too)
16:42 PM
zas[m]

I wonder if Anubis is able to handle our traffic too. There's no number.
16:42 PM
"Anubis has very minimal system requirements. I suspect that 128Mi of ram may be sufficient for a large number of concurrent clients. Anubis may be a poor fit for apps that use WebSockets and maintain open connections, but I don't have enough real-world experience to know one way or another."
16:43 PM
reosarevok[m]

If we need to limit support for some older browsers temporarily while we find better options, that's a sacrifice that seems sensible to me
16:43 PM
bitmap[m]

s/past/part/
16:43 PM
reosarevok[m]

zas[m]: Only one way to find out?
16:43 PM
zas[m]

Well, we can conduct a test on test.mb (...)
16:43 PM
And evaluate pros & cons after that
16:44 PM
reosarevok[m]

It seems worth a try compared with what you are having to spend time doing now
16:44 PM
Worst case scenario, we know we need to keep doing the same and look into cloudflare or our own version
16:45 PM
zas[m]

Also I wonder how it scales, need to check that
16:46 PM
lucifer[m]

also, https://github.com/TecharoHQ/anubis/issues/34
16:47 PM
zas[m]

Also it might be tricky to insert in our proxies chain ...
16:49 PM
So, let's try to deploy it for test.mb at least
16:49 PM
About ws / website separation, do we agree to move to api.mb a bit faster?
16:50 PM
1) ensure it works 2) update docs & notify users 3) redirects if possible
16:51 PM
How long do we need to deprecate mb.o/ws/ ? Years.
16:51 PM
mayhem[m]

zas[m]: 10!
16:51 PM
reosarevok[m]

As long as we don't entirely break the non api.mb version, it seems fine - even if it's slowed down
16:52 PM
mayhem[m]

and people still complained that we yanked the service "without notice"
16:54 PM
zas[m]

@bitmap Is moving to api.mb a problem for MB server?
16:54 PM
lucifer[m]

i think throttling redirects from ws to api would annoy at least active users and make them migrate.
16:54 PM
bitmap[m]

zas[m]: no, it doesn't care about which domain it's being served from as long as it's configured properly
16:55 PM
zas[m]

ok, perfect.
16:55 PM
So let's check if it's properly configured this week (I'll set it up)
16:55 PM
bitmap[m]

I'm not sure we can redirect mb.org/ws/ requests without breaking everything but slowly throttling it more might be a good incentive to switch
16:57 PM
zas[m]

I guess there's no problem with GET/HEAD requests right?
16:58 PM
Also for beta & test, api.test.mb and api.beta.mb ? or?
16:58 PM
bitmap[m]

I expect most clients will follow redirects properly, but can't be certain... only for data submission would it definitely be a problem
16:59 PM
zas[m]: not sure, does LB use a specific layout already?
16:59 PM
mayhem[m]

yes
16:59 PM
https://test-api.listenbrainz.org/1/status/serv...
16:59 PM
lucifer[m]

api., beta-api., test-api.
16:59 PM
mayhem[m]

that.
17:00 PM
zas[m]

ok
17:01 PM
let's stick to that then
17:01 PM
I'll configure everything tomorrow