monkey: around? You're going to love the latest username + LB clash!
2025-10-10 28323, 2025
rustynova[m]1 joined the channel
2025-10-10 28324, 2025
rustynova[m]1
Same here with alistral. The only reason why I fetch all the recordings of your listens for recording stats is that LB may return two different IDs for the same recordings.
2025-10-10 28324, 2025
rustynova[m]1
I feel like a simple API where you can send multiple mbids then get their associated mbids would be easier
reosarevok[m] sent a code block: https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/rjZLxQmAgWTMzWaFlDTzKsUA
2025-10-10 28327, 2025
yvanzo[m]
Yes, thank you!
2025-10-10 28307, 2025
yvanzo[m]
zas: Or is it still OpenResty? ^
2025-10-10 28342, 2025
zas[m] joined the channel
2025-10-10 28342, 2025
zas[m]
That's still openresty
2025-10-10 28328, 2025
reosarevok[m]
Ok, what do I say then 😅 Do I recommend OpenResty or HAProxy? Or either? :D
2025-10-10 28327, 2025
d4rk has quit
2025-10-10 28339, 2025
yvanzo[m]
Either. HAProxy and/or OpenResty.
2025-10-10 28342, 2025
zas[m]
Well, I woudn't recommend openresty because it is based on lua script and this is very specific to us (we also filter out traffic there, or give different rate limits depending on clients), we also use nginx rate limiting features (native ones, not lua-based), and haproxy ones (at tcp level in our case). So I guess you can keep something like "Our recommendation would be to use rate limiting features of your reverse proxy)
2025-10-10 28330, 2025
yvanzo[m]
* <del>Either. HAProxy and/or OpenResty.</del>
2025-10-10 28343, 2025
reosarevok[m]
Still "which we do in production for [our own rate limiting]", right? :)
2025-10-10 28303, 2025
reosarevok[m]
(then we can also kinda document our rate limiting in a sneaky way and maybe we'll get less questions :p )
Well, not sneaky as such, but you suggested to link to the doc :)
2025-10-10 28333, 2025
reosarevok[m]
Anyway
2025-10-10 28330, 2025
yvanzo[m]
Hi mayhem, would you prefer this color scheme? https://gist.github.com/yvanzo/d029969daba12effa9…... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/qXpwgjQcbpvWHraLIfaZIfmr>)
one graph that i miss a lot from the older graphs about the schema, is the core view. artitst/artist credit/release-group/release/medium/track/recording. I need that a lot at time.
2025-10-10 28313, 2025
mayhem[m]
but even that has become a huge graph.
2025-10-10 28341, 2025
julian45[m]
[zas](https://matrix.to/#/@zas666:matrix.org): what do you think about trying a fresh nagios VM so that we can try running a much more recent nagios version? (IIRC from the summit, we're running 3.5.1 and nagios core 4.0 was released ca. 2013…)
2025-10-10 28341, 2025
julian45[m]
or, it might be worth giving another monitoring system, like zabbix or icinga2, a spin
2025-10-10 28302, 2025
zas[m]
What about the nrpe part? I mean we have it installed on each server, and changing it is the main problem.
2025-10-10 28308, 2025
julian45[m]
s/zas/@zas666:matrix.org/, s/,//, s/,//
2025-10-10 28314, 2025
julian45[m]
what release of nrpe is currently on servers?
2025-10-10 28319, 2025
julian45[m]
(this might also be a good chance to capture whatever we can in an agentless manner… for example, $prevdayjob did a lot of monitoring using snmp, with pings and ssh-based sensors filling in the rest)
2025-10-10 28341, 2025
julian45[m]
tcp wrappers mentioned/documented in the nrpe github repo… i don't think i've seen those maintained in a supported linux distro in years, lol (e.g., red hat fully deprecated/removed that functionality with the release of rhel 8 in 2019)
We also monitor systems using telegraf/influxdb/grafana stack, but having nagios for simple checks is a good thing, so I would say replacing it by anything more modern, and SIMPLE to maintain. Currently updates for nagios configs (the vm) are still using old nagios-chef (https://github.com/metabrainz/nagios-chef) and config is in https://github.com/metabrainz/nagios
julian45: feel free to set up whatever you feel is good to replace (very) old nagios, I guess it can be worse (if it works same or better ofc)
2025-10-10 28325, 2025
julian45[m]
sounds good. ideally it will be hard to be worse than nagios (though then again, i've seen the stuff microsoft puts out for monitoring these days, lol)
2025-10-10 28352, 2025
mayhem[m]
<yvanzo[m]> "mayhem: It is included into..." <- great, I need to set a bookmark to that. thanks!
mayhem, lucifer LB has been having some stability issues all day, and I'm seeing two things in the logs:... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/LfwiqpZqBdQOFrbqCJueijtV>)
2025-10-10 28333, 2025
Maxr1998 has quit
2025-10-10 28350, 2025
mayhem[m]
oh joy.
2025-10-10 28357, 2025
mayhem[m]
lucifer: you about to help?
2025-10-10 28301, 2025
Maxr1998 joined the channel
2025-10-10 28329, 2025
lucifer[m] joined the channel
2025-10-10 28329, 2025
lucifer[m]
Yes I restarted the production container twice because of that and also looking at Postgres load issues currently.
2025-10-10 28347, 2025
mayhem[m]
I wonder if we should move typesense to a temp server for the time being.
2025-10-10 28353, 2025
lucifer[m]
I saw a lot of requests to retrieve listens for a lot of users so unsure if it's a bot.
2025-10-10 28316, 2025
lucifer[m]
mayhem[m]: Could help but Postgres is consuming 1250% cpu atm.
2025-10-10 28325, 2025
monkey[m]1
The same users again and again seems odd to me, but maybe there's another explanation
2025-10-10 28307, 2025
lucifer[m]
>the same short list of users's listens being fetched constantly at very high speed, mainly the team + associates
2025-10-10 28311, 2025
lucifer[m]
huh interesting.
2025-10-10 28312, 2025
mayhem[m]
if the same request is done over and over again, do we have enough caching?
2025-10-10 28320, 2025
mayhem[m]
or can we pause the user?
2025-10-10 28345, 2025
monkey[m]1
The users in question being fetched is us (the team) mainly...
2025-10-10 28320, 2025
lucifer[m]
okay yeah i see that in logs too now
2025-10-10 28320, 2025
monkey[m]1
So it's either related to a feature we have active (RSS or something, maybe?) or it's because there are links to our usernames on the internet, IMO
2025-10-10 28326, 2025
lucifer[m]
there are lots of different ipv6 addresses making the api calls
how about i block all requests having lovable.dev in the user agent?
2025-10-10 28341, 2025
lucifer[m]
mayhem: ?
2025-10-10 28332, 2025
mayhem[m]
🙄
2025-10-10 28332, 2025
lucifer[m]
zas: is that something we should do at gateway level temporarily or should i do that at LB level?
2025-10-10 28326, 2025
mayhem[m]
gateway is better/
2025-10-10 28318, 2025
zas[m]
I can block the IP totally
2025-10-10 28330, 2025
zas[m]
The one above is correct right?
2025-10-10 28359, 2025
lucifer[m]
zas: there are a lot of ipv6s making the requests.
2025-10-10 28308, 2025
zas[m]
ah, ofc...
2025-10-10 28312, 2025
lucifer[m]
which is also why its not getting rate limited.
2025-10-10 28305, 2025
lucifer[m]
mayhem: we don't cache on get listens endpoint to always serve the latest listens but yes we should cache it with proper invalidation when timescale writer inserts any listens.
2025-10-10 28327, 2025
mayhem[m]
that should help with stupid clients
2025-10-10 28341, 2025
zas[m]
I count 183 different IPv6 using this UA string since 2 days, I can block them all
2025-10-10 28357, 2025
zas[m]
But that's not many hits, only 116484 since yesterday
2025-10-10 28319, 2025
zas[m]
so if those 60k hits a day are enough to cause those issues, I'd say the problem is server-side
2025-10-10 28352, 2025
mayhem[m]
zas: we're agreeing with you. :)
2025-10-10 28307, 2025
monkey[m]1
I think if they are hitting the troi-related endpoints, it consumes a lot of resources
2025-10-10 28334, 2025
mayhem[m]
perhaps troi needs to be only for logged in users?