in #metabrainz

0:05 AM
andrew_su1 joined the channel
0:13 AM
spynx is now known as spynxic
0:49 AM
andrew_su1 has quit
0:50 AM
andrew_su1 joined the channel
0:56 AM
kilos_6 joined the channel
0:58 AM
ijc_ joined the channel
0:58 AM
bitmap_ joined the channel
0:59 AM
Arsen_ joined the channel
1:04 AM
kilos_ has quit
1:04 AM
kilos_6 is now known as kilos_
1:04 AM
bitmap has quit
1:04 AM
ijc has quit
1:04 AM
Arsen has quit
1:04 AM
bitmap_ is now known as bitmap
1:48 AM
andrew_su1 has quit
1:49 AM
andrew_su1 joined the channel
2:04 AM
andrew_su1 has quit
2:05 AM
andrew_su1 joined the channel
2:13 AM
bitmap has quit
2:17 AM
bitmap joined the channel
2:19 AM
s1b1_ joined the channel
2:19 AM
s1b1 has quit
2:30 AM
andrew_su1 has quit
2:57 AM
lusciouslover has quit
2:57 AM
lusciouslover joined the channel
4:08 AM
vardhan__ joined the channel
4:10 AM
opticdelusion has quit
4:11 AM
opticdelusion joined the channel
4:36 AM
pite joined the channel
6:25 AM
pprkut has quit
6:26 AM
pprkut joined the channel
6:43 AM
Kladky joined the channel
8:35 AM
andrew_su1 joined the channel
9:10 AM
BrainzGit

[listenbrainz-server] 14granth23 opened pull request #3146 (03master…LB-1722): LB1722: Link to unlinked listens on the home page https://github.com/metabrainz/listenbrainz-serv...
9:42 AM
andrew_su1 has quit
10:34 AM
[metabrainz.org] 14reosarevok opened pull request #500 (03master…MEB-167): MEB-167: Clarify name in donations means real name https://github.com/metabrainz/metabrainz.org/pu...
10:36 AM
q3lont joined the channel
10:37 AM
[metabrainz.org] 14reosarevok opened pull request #501 (03master…fix-dockerfile-warnings): Fix Dockerfile warnings https://github.com/metabrainz/metabrainz.org/pu...
10:48 AM
lusciouslover has quit
10:48 AM
lusciouslover joined the channel
11:05 AM
[metabrainz.org] 14mayhem merged pull request #500 (03master…MEB-167): MEB-167: Clarify name in donations means real name https://github.com/metabrainz/metabrainz.org/pu...
11:05 AM
[metabrainz.org] 14mayhem merged pull request #501 (03master…fix-dockerfile-warnings): Fix Dockerfile warnings https://github.com/metabrainz/metabrainz.org/pu...
11:06 AM
[metabrainz.org] release 03v-2025-01-24.0 has been published by 14mayhem: https://github.com/metabrainz/metabrainz.org/re...
11:06 AM
lusciouslover has quit
11:07 AM
lusciouslover joined the channel
11:42 AM
mayhem[m]

ansh: monkey lucifer : The mapping cron jobs are all working correctly now: https://sentry.metabrainz.org/organizations/met...
11:42 AM
I've changed the owner/notifications to be #listenbrainz now -- so if we see an alert from Sentry something is actually wrong and needs to be investigated.
11:43 AM
monkey[m]

Roger.
11:43 AM
Nice work, this is going to be very helpful
11:44 AM
mayhem[m]

agreed. we should know about failures before our users do. #abouttime
11:46 AM
monkey[m]

Dang, sentry is a drama queen
11:46 AM
monkey[m] uploaded an image: (4KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/xIjTgADllrWthsmSHmlybhgJ/image.png >
11:50 AM
mayhem: Should we try to reach out to users like this, considering their metadata is not exactly clean?
11:50 AM
https://listenbrainz.org/user/keeganmacharia/
11:50 AM
See if we can help them or something?
11:50 AM
mayhem[m]

yeah, sounds like a very considerate thing to do.
11:51 AM
monkey[m] uploaded an image: (31KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/mIQEthXbZrelJZfGOJBjsDyh/image.png >
11:51 AM
monkey[m]

Also creates some unexpected errors:
12:02 PM
mayhem[m]

I think sentry is trolling me
12:04 PM
monkey[m]

Hehehe, I was gonna say, "are you testing us ?"
12:10 PM
Maybe just needs a bit more wiggle room?
12:10 PM
mayhem[m]

ran fine now -- I suspect it was missed because of me mucking with the container.
12:10 PM
we'll see if it runs in 4 hours.
12:11 PM
(the last run didn't leave a log file, suggesting that it didn't even attempt to run)
12:13 PM
monkey: this has been fixed now, yes? https://tickets.metabrainz.org/browse/LB-1590
12:13 PM
BrainzBot

LB-1590: Global recording listen counts not updating
12:14 PM
monkey[m]

Unclear to me. I would say it is probably fixed but if we close the ticket it woiuld be good to add a comment welcoming users to reopen the ticket accordingly
12:15 PM
From revently lookign at it, global counts were looking good
12:15 PM
recently*
12:22 PM
BrainzGit

[listenbrainz-server] 14MonkeyDo merged pull request #3146 (03master…LB-1722): LB1722: Link to unlinked listens on the home page https://github.com/metabrainz/listenbrainz-serv...
12:25 PM
lucifer[m]

mayhem: the base class IncrementalStats that was added in the sitewide stats PR has documentation for all the abstract methods, its not in the current PR and that's why it seems like a lot of the documentation is missing. I have added a comment to all the concrete classes to look at that class for documentation.
12:29 PM
wileyfoxyx[m] has quit
12:58 PM
BrainzGit

[listenbrainz-server] 14amCap1712 merged pull request #3115 (03master…incremental-stats-user): Incremental user stats https://github.com/metabrainz/listenbrainz-serv...
13:05 PM
lucifer[m]

mayhem: another day in Punxsutawney, LB#3116
13:05 PM
BrainzBot

Incremental stats entity: https://github.com/metabrainz/listenbrainz-serv...
13:22 PM
mayhem[m]

lucifer: can you please chime in on this PR? https://github.com/metabrainz/listenbrainz-serv...
13:24 PM
lucifer[m]

mayhem: i am unsure what the question is but every sitewide stat is calculated separately in spark and stored in the same couchdb database.
13:24 PM
i think one job for each stat-range combination might be an overkill though.
13:24 PM
mayhem[m]

is there a chance that, for instance, artists starts would succeed and be present, but release stats are not?
13:25 PM
lucifer[m]

maybe just one job that check it all.
13:25 PM
yes/
13:25 PM
and also artists week can pass but artists year can fail.
13:25 PM
mayhem[m]

yeah, I did one job where I checked artist and monkey thinks we should the others.
13:25 PM
lucifer[m]

applies to all user stats as well
13:26 PM
mayhem[m]

its clearly a slippery slope... where do you stop checking?
13:26 PM
lucifer[m]

i think you can club it by range or entity.
13:26 PM
mayhem[m]

club?
13:26 PM
monkey[m]

By range might make sense.
13:27 PM
Buckets
13:27 PM
lucifer[m]

group the checks. a check passes if all ranges of a given entity are up to date.
13:27 PM
mayhem[m]

ok, so check all artists ranges?
13:27 PM
lucifer[m]

yes.
13:27 PM
mayhem[m]

ok, can do.\
13:27 PM
monkey[m]

So, check weekly artist and release stats age, if one is missing return none (alert), otherwise use the oldest date between the two
13:27 PM
Same for other ranges.
13:28 PM
Does that sound reasonable?
13:28 PM
mayhem[m]

"weekly artist and release stats age," that is not what I understood.
13:28 PM
check artist all time, artist week, artist last week.
13:28 PM
and so on
13:29 PM
monkey[m]

Depends how we group it I suppose. In the interest of not creating too many alerts, we could group by range (i.e. artists+releases weekly, artists+releases monthly, etc.)
13:30 PM
That's how I understood "club it by range or entity"
13:31 PM
or we could group by time range , i.e. check artist stats for week+month+year
13:31 PM
But the issue here is that our alerting system is based on an age metric. Are those stats all calculated at the exact same time and interval?
13:31 PM
mayhem[m]

as lucifer said, that is overkill.
13:32 PM
both him and I agreeed to pick one entity and check all the ranges.
13:32 PM
and if we find that that fails us, we can always improve it.
13:32 PM
monkey[m]

> a check passes if all ranges of a given entity are up to date
13:32 PM
I might be misunderstanding, but I don't know if that will work
13:32 PM
Because we don't return a boolean OK/not OK
13:32 PM
lucifer[m]

mayhem: sorry for the confusion, i think you should check all ranges and entities but report only one alert. if any of them is failing.
13:33 PM
mayhem[m]

oh, now I see what you were saying.
13:33 PM
I wonder how long that test will take.
13:33 PM
lucifer[m]

should be less than 1 minute or two i think.
13:34 PM
mayhem[m]

yea, but this is being done in response to a web call.
13:34 PM
it the call times out it could give a false positive.
13:34 PM
lucifer[m]

does grafana make the call or custom python code?
13:34 PM
mayhem[m]

for this reason we might need to break it into smaller groups
13:34 PM
I beleive grafana.
13:35 PM
I would be inclined to make a check for artists, all ranges. releases, all ranges.
13:35 PM
and so on. that should finish in time.
13:35 PM
lucifer[m]

sure that sounds fine too.
13:35 PM
mayhem[m]

k
13:36 PM
lucifer[m]

alternatively we can write the time at which stats are ingested into couchdb to say redis and let grafana handle it.
13:37 PM
mayhem[m]

thats more work. let me see how this plays out.
13:37 PM
lucifer[m]

i see prometheus is pull based so we would store the latest timestamp in redis and then the endpoint returns the age from redis.
13:38 PM
sure, we can change it later if needed.
13:46 PM
mayhem[m]

hhmm. next problem.
13:46 PM
right now the python code does not make a determination if something is out of date or not.
13:47 PM
the grafana alert does that. so I can't realistically check all of them and then report back if at least one failed. that decision is not mine to make. How should we handle that?
13:47 PM
short of reporting them all, I would have to duplicate "is this current" logic, which is not great.
13:59 PM
monkey[m]

That is why I suggested returning the age of whichever is the oldest state for that entity. So for example if weekly artist stats ran fine but monthly did not, return the age of the (previous) monthly stats.
13:59 PM
That means when an alert triggers we don't exactly know which range failed, but we know some artist stats failed.
13:59 PM
mayhem[m]

ah, good simple solution! will do that.
14:00 PM
monkey[m]

And of course if even only one of the stats for that entity is not found in DB, return none, which would alert
14:11 PM
Arsen_ is now known as Arsen
14:36 PM
minimal joined the channel
14:54 PM
Jade[m]1 has quit
15:00 PM
zas[m]

bitmap: we keep getting OOM kills on selda/yamaoka (MB containers). I was thinking about what you said (that now we get more requests on same server, it leads to more impact for users on downtime). What about running multiple containers for the same service on the same machine and revert back to settings we had before. We could have 2 or 3 instances of MB containers on the same machine, it will change deployments/upgrades though, but over
15:00 PM
the time we'll get even more powerful machines, so such move make sense anyway. It would also ease controlling resources allocated for each container and limit impact of misbehavior.
15:00 PM
Well, just something I was thinking about.
15:25 PM
mario[m] joined the channel
15:25 PM
mario[m]

Hi everyone, I'm new here so not sure if this is the correct room for a request like this, but I recently moved my Jellyfin install to a new provider, and now it looks like my IP can't reach musicbrainz.org (maybe blacklisted?) when running "beet import". Can I request for the IP (43.153.138.95) to be whitelisted?
15:25 PM
Thanks - and if this is not the right channel, apologies!
15:25 PM
mayhem[m]

zas: one for you ^^^^
15:26 PM
mario: zas normally handles these requests, hang tight until he appears
15:26 PM
mario[m]

awesome, thanks! :)
15:32 PM
soundandvision[m has quit
15:34 PM
q3lont has quit
15:46 PM
BrainzGit

[listenbrainz-server] 14amCap1712 merged pull request #3116 (03master…incremental-stats-entity): Incremental stats entity https://github.com/metabrainz/listenbrainz-serv...
15:53 PM
zas[m]

mario: this IP doesn't look to be blocked on our side, can you access https://musicbrainz.org from the same node you are using beets on?