#metabrainz

/

      • pite joined the channel
      • monkey[m]
        zas: Hello! Hope you are not too busy battling AI vermin, if you have a moment I have a question regarding a Grafana alert.... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
      • d4rk-ph0enix has quit
      • d4rk-ph0enix joined the channel
      • zas[m]
        Why doesn't it provide data sometimes? I mean, if the reason is the service doesn't run we expect an alert, right?
      • monkey[m]
        Transient issues that resolve themselves
      • We have other alerts for non-responding website and API, so I'm not expecting these stats alerts to trigger when connection is temporarily lost.
      • After a delay of 10 minutes, then yes it would make sense to trigger.
      • zas[m]
        There's already a 5m pending period for this alert, we can increase this
      • monkey[m]
        Ah, yes please!
      • Is that a 5m delay on triggering the alert in any case, or just for loss of data (just out of curiosity)
      • zas[m]
        10m or more?
      • jasje[m]
        Any contributors who are cooking a GSoC proposal towards ListenBrainz Android project should take a look at ideas page again. Eased some things and added more context.
      • zas[m]
        Any alert for this alert rule
      • We can't dissociate them
      • monkey[m]
        That's what I figured. Thanks for confirming
      • To me this reeads as a 1m delay, not 5. Am I looking at the wrong thing?
      • monkey[m] uploaded an image: (34KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/XKEAkQFpjiAZCDNZEtXQOqpW/image.png >
      • zas[m]
        That's the time the rule is evaluated, but the pending period is the time before it actually alerts (if the state is the same)
      • zas[m] uploaded an image: (46KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/wQKnfFxSmudIESOxhainIIZZ/image.png >
      • So basically it evaluates the state every minute, but wait for 5 minutes to see if it was transient or not
      • It limits false alerts
      • We can evaluate the state less often too
      • but then the pending period has to be longer
      • monkey[m]
        I was lookign at the pending period, it seemed to me to be set to 1m when I opened the alert edit page
      • zas[m]
        If you look carefully Grafana shows alerts as activated as "Pending" when it happens, notifications are sent at the end of the pending period
      • s/as//
      • monkey[m]
        Anyway, thanks for the assist. Let's see how it goes with 5m delay.
      • zas[m]
        Wait, which alert did you copied from? because I have 5m for stats
      • monkey[m]
        <zas[m]> "There's already a 5m pending..." <- This is the bit I was confused about. I can only see 1m pending period, nothing that says it was set to 5m
      • zas[m]
      • monkey[m]
      • I see, those are thesitewide
      • zas[m]
        You can change pending period to 5m for it then
      • monkey[m]
        OK, and it wasn't configured with the same delay. Mystery solved :)
      • I'll change that for the sitewide stats alerts
      • Then I think it will work fine, the other (non-sitewide) alerts have been behaving better
      • zas[m]
        The pending period limits the number of notifications if states change too quickly$
      • s/$/./
      • monkey[m]
        Yep, that's what I was looking for.
      • Thank you!
      • zas[m]
        np :)
      • reosarevok[m]
        Weren't we supposed to meet around now? :)
      • zas[m]
        yup
      • reosarevok[m]
        mayhem, bitmap, julian45 @julian45:julian45.net
      • mayhem[m] is here despite being on the phone the bank
      • So, any updates or new info?
      • julian45[m]
        none from me that haven't already been discussed out-of-band
      • zas[m]
        We have to decide what to do, it was suggested to use Anubis, does it look actionable? Are there any objection?
      • mayhem[m]
        I wrote a proposal with my ideas, but didn't get a lot of feedback, only from julian45 who made a number of good arguments as to why its not a great idea.
      • zas[m]
        We had discussions about Cloudflare, and potential conflicts with our policies, this should be investigated too (in case we decide to move to such service, cloudflare or similar)
      • mayhem[m]
        I feel so so about anubis. its seems a bit heavy handed, so I wish we could find out more about what level of effort these people are willing to go through to keep scraping us.
      • the recent cloudflare outages make me really not like that option much.
      • julian45[m]
        anubis looks actionable IMO as a first line/first attempt, esp since docs indicate policy is configurable to allow, e.g., legit scrapers like google while challenging others
      • reosarevok[m]
        I think heavy handed is a good start given the situation
      • mayhem[m]
        if we implement anubis, can we have it only on pages that contain data that is being scraped?
      • reosarevok[m]
        (and we could lower the heavy-handiness if we get things under control in other ways)
      • julian45[m]
        i do worry that it could potentially be annoying for some users who, e.g., disable js by default in their browsers, but those kinds of folks should be willing to carve out exceptions
      • mayhem[m]
        e.g. style guides require no anubis
      • zas[m]
        mayhem: about your proposal, I think we should rely on existing tools first if possible (not reinventing the wheel), but I don't totally rule it out, because we might find limitations in third party tools we don't have with our own ones.
      • reosarevok[m]
        Those users could possibly get in touch with us and get exceptions added for them
      • julian45[m]
        mayhem[m]: this kind of goes back to the separation of API requests to a subdomain need that was discussed yesterday
      • reosarevok[m]
        (re: no-js people)
      • julian45[m]
        reosarevok[m]: if they aren't able to configure their clients to make exceptions for us, sure - chances are they would need js to use our site(s) anyway, no?
      • mayhem[m]
        julian45[m]: that seems conflated to me. we can partition the URL space for web pages without ever considering the API.
      • julian45[m]
        ah i see
      • zas[m]
        They (may) hit ANY page, but the fact is those with MB data have a much higher cost for us (they hit backends and db)
      • reosarevok[m]
        julian45: fair, we require JS for editing anyway, just not for reading - but the kind of people we'd possibly make exceptions for probably edit so
      • julian45[m]
        if needed, per the policy doc i linked, i think we can tell anubis to let certain path regexes through but not others
      • mayhem[m]
        julian45[m]: still, your point is valid.
      • but probably not needed right this second.
      • reosarevok[m]
        Would something like anubis interfere with external seeding?
      • (it's ok if it does temporarily given the circumstances, but making sure we don't need to make MBS changes to support it)
      • julian45[m]
        i doubt it, but then again external seeding is something we allow that many implementers (e.g., GNOME project gitlab) might not
      • so i would suggest deploying on test or beta to figure that out before prod
      • * i doubt it, but then again external seeding is something we have as part of our use cases that many implementers (e.g., GNOME project gitlab) might not
      • reosarevok[m]
        zas: is beta also being hit?
      • zas[m]
        Yes, it was first
      • reosarevok[m]
        (so, if we put this in front of beta first, would it actually teach us if it will help)
      • Ok
      • zas[m]
        This is how I discovered the problem, beta containers were eating a lot of resources suddenly
      • bitmap[m]
        reosarevok[m]: it might interfere with release editor seeding since that requires POST data which can't be redirected
      • reosarevok[m]
        Hopefully those can be let through then since those should never match the kind of hits causing issues
      • zas[m]
        But seeding requires to be logged in, so we can just skip any check for those
      • julian45[m]
        zas[m]: it usually forces logout, then login though, right?
      • reosarevok[m]
        Ok, that's another thing I asked several times but I think never got an answer for: can we separate logged in from not logged in queries?
      • For anubis
      • And run it only on logged out for now
      • mayhem[m]
        I propose that we try anubis on mb.org's data pages and see what happens.
      • I much prefer this option over cloudflare.
      • who objects to this suggestion?
      • (sorry was disconnected for a bit, back now)
      • zas[m]
        bitmap suggested an internal header set by backends for that, so we can get this info on gateways at least
      • reosarevok[m]
        I'd say on beta.mb.org for now, but I'd agree otherwise
      • julian45[m]
        reosarevok[m]: unfortunately not sure
      • mayhem[m]
        mayhem[m]: this may now be out of date. heh.
      • julian45[m]
        * not sure i.r.t. anubis
      • reosarevok[m]
        That'd also allow us to play with any changes we need to make things better on the MBS side
      • Before we put them in prod
      • bitmap[m]
        julian45[m]: yeah, IIRC the login cookies are not available to the request MB receives
      • but if it's only restricted to data pages then not an issue
      • reosarevok[m]
        Yeah, I guess if we can ignore /edit pages it seems good
      • Well, /edit /add /create etc
      • But we could easily come up with a list
      • julian45[m]
        reosarevok[m]: which seems doable but i would like someone to double check the doc page i linked to make sure i'm interpreting correctly
      • reosarevok[m]: great, because policy for anubis is configured by json file anyway per docs
      • reosarevok[m]
      • I don't see why it would not work for other things than robots.txt :)
      • julian45[m]
        exactly
      • just wanted to be sure i wasn't the only one reaching that conclusion from docs
      • reosarevok[m]
        Ok, this looks like it should work - so sysadmin team works on setting up anubis, mbs team figures out what paths to allow (could include all of /ws/2 as well for now AFAICT), we reconvene and let it loose on test first, then beta if nothing is horribly broken?
      • bitmap[m]
        "Anubis uses a multi-threaded proof of work check to ensure that users browsers are up to date and support modern standards." not sure what they mean by the last past (since we support older browsers too)
      • zas[m]
        I wonder if Anubis is able to handle our traffic too. There's no number.
      • "Anubis has very minimal system requirements. I suspect that 128Mi of ram may be sufficient for a large number of concurrent clients. Anubis may be a poor fit for apps that use WebSockets and maintain open connections, but I don't have enough real-world experience to know one way or another."
      • reosarevok[m]
        If we need to limit support for some older browsers temporarily while we find better options, that's a sacrifice that seems sensible to me
      • bitmap[m]
        s/past/part/
      • reosarevok[m]
        zas[m]: Only one way to find out?
      • zas[m]
        Well, we can conduct a test on test.mb (...)
      • And evaluate pros & cons after that
      • reosarevok[m]
        It seems worth a try compared with what you are having to spend time doing now
      • Worst case scenario, we know we need to keep doing the same and look into cloudflare or our own version
      • zas[m]
        Also I wonder how it scales, need to check that
      • lucifer[m]
      • zas[m]
        Also it might be tricky to insert in our proxies chain ...
      • So, let's try to deploy it for test.mb at least
      • About ws / website separation, do we agree to move to api.mb a bit faster?
      • 1) ensure it works 2) update docs & notify users 3) redirects if possible
      • How long do we need to deprecate mb.o/ws/ ? Years.
      • mayhem[m]
        zas[m]: 10!
      • reosarevok[m]
        As long as we don't entirely break the non api.mb version, it seems fine - even if it's slowed down
      • mayhem[m]
        and people still complained that we yanked the service "without notice"
      • zas[m]
        @bitmap Is moving to api.mb a problem for MB server?
      • lucifer[m]
        i think throttling redirects from ws to api would annoy at least active users and make them migrate.
      • bitmap[m]
        zas[m]: no, it doesn't care about which domain it's being served from as long as it's configured properly
      • zas[m]
        ok, perfect.
      • So let's check if it's properly configured this week (I'll set it up)
      • bitmap[m]
        I'm not sure we can redirect mb.org/ws/ requests without breaking everything but slowly throttling it more might be a good incentive to switch
      • zas[m]
        I guess there's no problem with GET/HEAD requests right?
      • Also for beta & test, api.test.mb and api.beta.mb ? or?
      • bitmap[m]
        I expect most clients will follow redirects properly, but can't be certain... only for data submission would it definitely be a problem
      • zas[m]: not sure, does LB use a specific layout already?
      • mayhem[m]
        yes
      • lucifer[m]
        api., beta-api., test-api.
      • mayhem[m]
        that.
      • zas[m]
        ok
      • let's stick to that then
      • I'll configure everything tomorrow