Why doesn't it provide data sometimes? I mean, if the reason is the service doesn't run we expect an alert, right?
monkey[m]
Transient issues that resolve themselves
We have other alerts for non-responding website and API, so I'm not expecting these stats alerts to trigger when connection is temporarily lost.
After a delay of 10 minutes, then yes it would make sense to trigger.
zas[m]
There's already a 5m pending period for this alert, we can increase this
monkey[m]
Ah, yes please!
Is that a 5m delay on triggering the alert in any case, or just for loss of data (just out of curiosity)
zas[m]
10m or more?
jasje[m]
Any contributors who are cooking a GSoC proposal towards ListenBrainz Android project should take a look at ideas page again. Eased some things and added more context.
zas[m]
Any alert for this alert rule
We can't dissociate them
monkey[m]
That's what I figured. Thanks for confirming
To me this reeads as a 1m delay, not 5. Am I looking at the wrong thing?
monkey[m] uploaded an image: (34KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/XKEAkQFpjiAZCDNZEtXQOqpW/image.png >
zas[m]
That's the time the rule is evaluated, but the pending period is the time before it actually alerts (if the state is the same)
zas[m] uploaded an image: (46KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/wQKnfFxSmudIESOxhainIIZZ/image.png >
So basically it evaluates the state every minute, but wait for 5 minutes to see if it was transient or not
It limits false alerts
We can evaluate the state less often too
but then the pending period has to be longer
monkey[m]
I was lookign at the pending period, it seemed to me to be set to 1m when I opened the alert edit page
zas[m]
If you look carefully Grafana shows alerts as activated as "Pending" when it happens, notifications are sent at the end of the pending period
s/as//
monkey[m]
Anyway, thanks for the assist. Let's see how it goes with 5m delay.
zas[m]
Wait, which alert did you copied from? because I have 5m for stats
monkey[m]
<zas[m]> "There's already a 5m pending..." <- This is the bit I was confused about. I can only see 1m pending period, nothing that says it was set to 5m
OK, and it wasn't configured with the same delay. Mystery solved :)
I'll change that for the sitewide stats alerts
Then I think it will work fine, the other (non-sitewide) alerts have been behaving better
zas[m]
The pending period limits the number of notifications if states change too quickly$
s/$/./
monkey[m]
Yep, that's what I was looking for.
Thank you!
zas[m]
np :)
reosarevok[m]
Weren't we supposed to meet around now? :)
zas[m]
yup
reosarevok[m]
mayhem, bitmap, julian45 @julian45:julian45.net
mayhem[m] is here despite being on the phone the bank
So, any updates or new info?
julian45[m]
none from me that haven't already been discussed out-of-band
zas[m]
We have to decide what to do, it was suggested to use Anubis, does it look actionable? Are there any objection?
mayhem[m]
I wrote a proposal with my ideas, but didn't get a lot of feedback, only from julian45 who made a number of good arguments as to why its not a great idea.
zas[m]
We had discussions about Cloudflare, and potential conflicts with our policies, this should be investigated too (in case we decide to move to such service, cloudflare or similar)
mayhem[m]
I feel so so about anubis. its seems a bit heavy handed, so I wish we could find out more about what level of effort these people are willing to go through to keep scraping us.
the recent cloudflare outages make me really not like that option much.
julian45[m]
anubis looks actionable IMO as a first line/first attempt, esp since docs indicate policy is configurable to allow, e.g., legit scrapers like google while challenging others
I think heavy handed is a good start given the situation
mayhem[m]
if we implement anubis, can we have it only on pages that contain data that is being scraped?
reosarevok[m]
(and we could lower the heavy-handiness if we get things under control in other ways)
julian45[m]
i do worry that it could potentially be annoying for some users who, e.g., disable js by default in their browsers, but those kinds of folks should be willing to carve out exceptions
mayhem[m]
e.g. style guides require no anubis
zas[m]
mayhem: about your proposal, I think we should rely on existing tools first if possible (not reinventing the wheel), but I don't totally rule it out, because we might find limitations in third party tools we don't have with our own ones.
reosarevok[m]
Those users could possibly get in touch with us and get exceptions added for them
julian45[m]
mayhem[m]: this kind of goes back to the separation of API requests to a subdomain need that was discussed yesterday
reosarevok[m]
(re: no-js people)
julian45[m]
reosarevok[m]: if they aren't able to configure their clients to make exceptions for us, sure - chances are they would need js to use our site(s) anyway, no?
mayhem[m]
julian45[m]: that seems conflated to me. we can partition the URL space for web pages without ever considering the API.
julian45[m]
ah i see
zas[m]
They (may) hit ANY page, but the fact is those with MB data have a much higher cost for us (they hit backends and db)
reosarevok[m]
julian45: fair, we require JS for editing anyway, just not for reading - but the kind of people we'd possibly make exceptions for probably edit so
julian45[m]
if needed, per the policy doc i linked, i think we can tell anubis to let certain path regexes through but not others
mayhem[m]
julian45[m]: still, your point is valid.
but probably not needed right this second.
reosarevok[m]
Would something like anubis interfere with external seeding?
(it's ok if it does temporarily given the circumstances, but making sure we don't need to make MBS changes to support it)
julian45[m]
i doubt it, but then again external seeding is something we allow that many implementers (e.g., GNOME project gitlab) might not
so i would suggest deploying on test or beta to figure that out before prod
* i doubt it, but then again external seeding is something we have as part of our use cases that many implementers (e.g., GNOME project gitlab) might not
reosarevok[m]
zas: is beta also being hit?
zas[m]
Yes, it was first
reosarevok[m]
(so, if we put this in front of beta first, would it actually teach us if it will help)
Ok
zas[m]
This is how I discovered the problem, beta containers were eating a lot of resources suddenly
bitmap[m]
reosarevok[m]: it might interfere with release editor seeding since that requires POST data which can't be redirected
reosarevok[m]
Hopefully those can be let through then since those should never match the kind of hits causing issues
zas[m]
But seeding requires to be logged in, so we can just skip any check for those
julian45[m]
zas[m]: it usually forces logout, then login though, right?
reosarevok[m]
Ok, that's another thing I asked several times but I think never got an answer for: can we separate logged in from not logged in queries?
For anubis
And run it only on logged out for now
mayhem[m]
I propose that we try anubis on mb.org's data pages and see what happens.
I much prefer this option over cloudflare.
who objects to this suggestion?
(sorry was disconnected for a bit, back now)
zas[m]
bitmap suggested an internal header set by backends for that, so we can get this info on gateways at least
reosarevok[m]
I'd say on beta.mb.org for now, but I'd agree otherwise
julian45[m]
reosarevok[m]: unfortunately not sure
mayhem[m]
mayhem[m]: this may now be out of date. heh.
julian45[m]
* not sure i.r.t. anubis
reosarevok[m]
That'd also allow us to play with any changes we need to make things better on the MBS side
Before we put them in prod
bitmap[m]
julian45[m]: yeah, IIRC the login cookies are not available to the request MB receives
but if it's only restricted to data pages then not an issue
reosarevok[m]
Yeah, I guess if we can ignore /edit pages it seems good
Well, /edit /add /create etc
But we could easily come up with a list
julian45[m]
reosarevok[m]: which seems doable but i would like someone to double check the doc page i linked to make sure i'm interpreting correctly
reosarevok[m]: great, because policy for anubis is configured by json file anyway per docs
I don't see why it would not work for other things than robots.txt :)
julian45[m]
exactly
just wanted to be sure i wasn't the only one reaching that conclusion from docs
reosarevok[m]
Ok, this looks like it should work - so sysadmin team works on setting up anubis, mbs team figures out what paths to allow (could include all of /ws/2 as well for now AFAICT), we reconvene and let it loose on test first, then beta if nothing is horribly broken?
bitmap[m]
"Anubis uses a multi-threaded proof of work check to ensure that users browsers are up to date and support modern standards." not sure what they mean by the last past (since we support older browsers too)
zas[m]
I wonder if Anubis is able to handle our traffic too. There's no number.
"Anubis has very minimal system requirements. I suspect that 128Mi of ram may be sufficient for a large number of concurrent clients. Anubis may be a poor fit for apps that use WebSockets and maintain open connections, but I don't have enough real-world experience to know one way or another."
reosarevok[m]
If we need to limit support for some older browsers temporarily while we find better options, that's a sacrifice that seems sensible to me
bitmap[m]
s/past/part/
reosarevok[m]
zas[m]: Only one way to find out?
zas[m]
Well, we can conduct a test on test.mb (...)
And evaluate pros & cons after that
reosarevok[m]
It seems worth a try compared with what you are having to spend time doing now
Worst case scenario, we know we need to keep doing the same and look into cloudflare or our own version