[@mayhem:chatbrainz.org](https://matrix.to/#/@mayhem:chatbrainz.org) last week it had run out of resources but today I checked and there is some other weird error. Spark is not able to start at all, I will try to reset the cluster.
ApeKattQuest joined the channel
LupinIII has quit
HSOWA has quit
HSOWA joined the channel
pite has quit
Kladky joined the channel
reosarevok[m]
zas: MBS-11723 is still an issue, has the VM usage climbed again?
aerozol: does my comment in MBS-13809 make any sense to you or do you think it would rather be confusing? (ideally we'd separate the two but that's not trivial at all)
<mayhem[m]> "interesting: https://github.com..."; <- Interesting indeed. I'll note that they use the spotify API for searching on Spotify, but prefer to load the apple music web page and parse it for information. Says something about the apple music API, doesn't it?
mayhem[m]
as always, yes.
zas: ping
yvanzo[m]
bitmap, reosarevok: I reported the issue about duplicate messages upstream.
reosarevok[m]
Thanks
yvanzo[m]
reosarevok: Are you available to check the CSP issue again?
reosarevok[m]
Remind me what that was? But I'm available after lunch
LB-1673: Create a service alerts telegram group with the LB team members
zas[m]
Hmmm, to me it would be much easier to have a "metrics" endpoint (additionally), providing data directly in a usable format. If we want to use telegraf to collect them, see https://github.com/influxdata/telegraf/blob/mas...
in an ideal world, for not keeping stats over time, but only notification, what should the data format be?
zas[m]
Yes; but alerts are based on those stats anyway, basically each metric has its own timestamp (and format can be defined in telegraf, usually unix timestamp), see the example -> https://docs.influxdata.com/telegraf/v1/configu...
I'll only need ints for this case. what do you think of the proposed format?
@zas ^^
zas: ^^
zas[m]
last_updated is a timestamp too? What's the purpose? It would probably be more convenient to have a duration since last update instead if the goal is to alert when this is getting too old.
mayhem[m]
last updated is for when the service last updated the data. fetched is when the data was fetched, since it could be cached.
do you have seconds elapsed instead?
* do you want to have seconds
zas[m]
Since we have a timestamp in fetched, the field can use be seconds_since_last_update or the like, it will drop to 0 if just updated, and increase over the time, and we can have a threshold for the alert ("alert if not updated since N seconds"). It's not very convenient to work with timestamps in fields since it will require extra calculations (and sometimes that's make things overcomplicated).
We'll need the Chat ID for the matching Telegram channel, that's all (I think). And then configure alerts accordingly.
mayhem[m]
zas: how do I indicate an error? Lets assume that there are 4 metrics to fetch and I was unable to fetch a given metric, lets say it timed out. what do I return for that metric? or don't return it?