#metabrainz

/

1:11 AM
d4rkie joined the channel

2025-06-03 15419, 2025

1:13 AM
d4rk-ph0enix has quit

2025-06-03 15444, 2025

1:55 AM
Jigen joined the channel

2025-06-03 15455, 2025

1:56 AM
Goemon has quit

2025-06-03 15424, 2025

1:57 AM
ApeKattQuest has quit

2025-06-03 15410, 2025

1:59 AM
ApeKattQuest joined the channel

2025-06-03 15410, 2025

1:59 AM
ApeKattQuest has quit

2025-06-03 15410, 2025

1:59 AM
ApeKattQuest joined the channel

2025-06-03 15455, 2025

3:26 AM
dabeglavins has quit

2025-06-03 15430, 2025

4:47 AM
pite has quit

2025-06-03 15416, 2025

6:04 AM
lucifer[m]

rayyan_seliya123: let's start with just 78rpm/cylinder for now. you should update your prototype or rewrite it from scratch to work with the rest of the codebase. https://github.com/metabrainz/listenbrainz-server…

2025-06-03 15424, 2025

6:05 AM
lucifer[m]

you won't need to create new models, we don't use sqlalchemy as an orm anyway.

2025-06-03 15433, 2025

6:06 AM
lucifer[m]

look at the existing pydantic models at https://github.com/metabrainz/listenbrainz-server… and the SQL tables at https://github.com/metabrainz/listenbrainz-server…

2025-06-03 15450, 2025

6:07 AM
lucifer[m]

you can see apple and spotify follow the same structure whereas soundcloud has a different one. you should check what data is available in the IA and then either map it to the existing apple/spotify or soundcloud format. if neither is suitable then we can think of a new format.

2025-06-03 15438, 2025

7:23 AM
Maxr1998_ has quit

2025-06-03 15426, 2025

7:24 AM
Maxr1998 joined the channel

2025-06-03 15437, 2025

8:00 AM
rayyan_seliya123

<lucifer[m]> "you can see apple and spotify..." <- Thanks for the detailed guidance! I’ve reviewed the existing codebase and understand that for the 78rpm/cylinder collections, the Internet Archive data is mostly track-level, so I’ll map it to the SoundCloud format as you suggested.

2025-06-03 15437, 2025

8:00 AM
rayyan_seliya123

For moving forward, would you prefer that I work directly in the main ListenBrainz repo through PRs, or should I start in a separate branch and then merge my work in? I want to follow whatever workflow you think is best for the project.

2025-06-03 15437, 2025

8:00 AM
rayyan_seliya123

Let me know what you prefer, and I’ll get started accordingly!

2025-06-03 15418, 2025

8:08 AM
lucifer[m]

[@rayyan_seliya123:matrix.org](https://matrix.to/#/@rayyan_seliya123:matrix.org) work with LB repo through PRs.

2025-06-03 15445, 2025

8:12 AM
rayyan_seliya123

lucifer[m]: Okk fine 👍

2025-06-03 15454, 2025

10:04 AM
_BrainzGit

[listenbrainz-server] 14amCap1712 opened pull request #3292 (03master…similar-users): Use cosine similarity instead of pearson coefficient for similar users https://github.com/metabrainz/listenbrainz-server…

2025-06-03 15455, 2025

10:08 AM
lucifer[m]

monkey: the current similarity scores on LB should be using this new algorithm

2025-06-03 15402, 2025

10:09 AM
monkey[m]

Ooh, OK

2025-06-03 15406, 2025

10:09 AM
lucifer[m]

do they seem sensible to you?

2025-06-03 15418, 2025

10:09 AM
mayhem[m] is reading the PR right now

2025-06-03 15420, 2025

10:09 AM
lucifer[m]

i have a dump of the score before this change if you want to compare.

2025-06-03 15426, 2025

10:09 AM
monkey[m]

Damn, I don't have older version saved to compare, but let me look

2025-06-03 15446, 2025

10:10 AM
holycow23[m]

<lucifer[m]> "i'll fix the errors and let..." <- Hey lucifer, any update on this

2025-06-03 15453, 2025

10:10 AM
lucifer[m]

holycow23: not yet

2025-06-03 15400, 2025

10:11 AM
holycow23[m]

Okay

2025-06-03 15450, 2025

10:11 AM
mayhem[m]

lucifer: its hard to judge the cosine similarity without having prior data.

2025-06-03 15450, 2025

10:11 AM
monkey[m]

Would love to compare to see if it was the case before, but I'm already seeing twousers whom I have 6+ artists in common at 0% compatibility, which feels wrong.

2025-06-03 15450, 2025

10:11 AM
monkey[m]

But I've always thought the similarity scores were low

2025-06-03 15454, 2025

10:11 AM
lucifer[m] sent a code block: https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/yhckSFyYztsIjnTgUrAmpKDY

2025-06-03 15406, 2025

10:12 AM
_BrainzGit

[musicbrainz-server] 14reosarevok opened pull request #3552 (03master…MBS-14047): MBS-14047: Support medium in NotFound https://github.com/metabrainz/musicbrainz-server/…

2025-06-03 15408, 2025

10:12 AM
BrainzBot

MBS-14047: ISE when trying to reach non-existing medium MBID https://tickets.metabrainz.org/browse/MBS-14047

2025-06-03 15411, 2025

10:12 AM
lucifer[m]

monkey: ^

2025-06-03 15439, 2025

10:12 AM
mayhem[m]

I noticed that the closest person to me is now much stronger, while the others are weaker.

2025-06-03 15459, 2025

10:12 AM
lucifer[m] sent a code block: https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/txxvtmQvDEICRXZHaGRmpFeD

2025-06-03 15418, 2025

10:13 AM
lucifer[m]

the first row is pearson coefficient and the second row is cosine similarity

2025-06-03 15445, 2025

10:13 AM
monkey[m]

Well, they seem very close

2025-06-03 15459, 2025

10:13 AM
mayhem[m]

oh wow. well, I guess I haven't looked at similarity data in a while.

2025-06-03 15401, 2025

10:14 AM
lucifer[m]

mayhem: user similarities have not updated in a few days because it always OOM'ed.

2025-06-03 15440, 2025

10:14 AM
lucifer[m]

the last week it OOM'ed in a way to bring down the cluster so i changed it.

2025-06-03 15402, 2025

10:15 AM
monkey[m]

FWIW i think the similarity calculations need to be reviewed, but where it comes to fixing OOM and the smallest differences between the numbers I see, I would consider them equivalent.

2025-06-03 15429, 2025

10:15 AM
lucifer[m]

we can implement and experiment with pearson coefficient but just that we'd have to implement something manually. which is doable.

2025-06-03 15450, 2025

10:15 AM
lucifer[m]

i went with column similarities because it exists there and was a smaller fix.

2025-06-03 15418, 2025

10:16 AM
mayhem[m]

I think we should keep it for the time being and ask the community for feedback.

2025-06-03 15429, 2025

10:16 AM
monkey[m]

Might be worth calculating the average difference between the two methods for all the usersyou have data for, but... From my point of view they are both equally low.

2025-06-03 15450, 2025

10:16 AM
mayhem[m]

that downside to that is that everyone has an opinion on how it should work and there'd be "its just a little tweak" comments.

2025-06-03 15407, 2025

10:17 AM
mayhem[m]

(in ML, its never just a little tweak.)

2025-06-03 15404, 2025

10:18 AM
monkey[m]

Little tweak, big refactor

2025-06-03 15407, 2025

10:18 AM
lucifer[m]

fwiw, i don't recall any particular reason implementing it with pearson coefficient the first time.

2025-06-03 15456, 2025

10:18 AM
lucifer[m]

i do think there is value in experimenting and improving similarities but we'd need to do it more rigourously, define proper test datasets as a reference etc etc

2025-06-03 15422, 2025

10:19 AM
monkey[m]

Agreed.

2025-06-03 15408, 2025

10:20 AM
monkey[m]

For my numbers the differences were sub-percentage point, which makes virtually no difference, so OK from me.

2025-06-03 15457, 2025

11:58 AM
_BrainzGit

[listenbrainz-server] 14amCap1712 merged pull request #3292 (03master…similar-users): Use cosine similarity instead of pearson coefficient for similar users https://github.com/metabrainz/listenbrainz-server…

2025-06-03 15431, 2025

12:22 PM
fettuccinae[m]

mayhem: ping

2025-06-03 15427, 2025

12:34 PM
mayhem[m]

Pong

2025-06-03 15406, 2025

12:37 PM
fettuccinae[m]

For authroziation of endpoints, each project can have an auth token generated from MeB and saved in secrets of both MeB and the project.

2025-06-03 15406, 2025

12:37 PM
fettuccinae[m]

That way, when a project makes a request, we can authorize it using either the token or the owner_id of the token sent. Is this approach okay?

2025-06-03 15434, 2025

12:39 PM
mayhem[m]

I think so, but lucifer: is more on top of oath related questions. lucifer: ?

2025-06-03 15413, 2025

12:42 PM
lucifer[m]

@fettuccinae:matrix.org: not sure what you mean. but the workflow would be as follows: the project LB/BB/MB connect to MeB to obtain an auth token, and use that auth token in the request to post notifications to MeB, MeB validates whether the token has the relevant scopes and is owned by the one of the hardcoded client ids in the configuration, if yes then it proceeds otherwise it rejects the request.

2025-06-03 15431, 2025

12:46 PM
fettuccinae[m]

lucifer[m]: i was thinking an admin user could generate auth token for projects through https://metabrainz.org/profile#, and then this token could be hardcoded in the configuration of both project and Meb. So when a project sends a request with this token, MeB verifies it against the saved token in config and allows the request

2025-06-03 15447, 2025

12:47 PM
lucifer[m]

@fettuccinae:matrix.org: no we don't want to do that for multiple reasons. it makes token rotation hard and we cannot have expiring tokens this way.

2025-06-03 15439, 2025

12:48 PM
fettuccinae[m]

ohh, but how can the project get tokens in the authorization for if login is required for /oauth2/authorize.

2025-06-03 15442, 2025

12:48 PM
lucifer[m]

unless there is a strong reason we should stick to using the oauth way. i am running a bit behind schedule on client credentials grant but only the testing is pending, once that is done it should be available for use in your project.

2025-06-03 15450, 2025

12:48 PM
fettuccinae[m]

s/for/flow/

2025-06-03 15407, 2025

12:49 PM
lucifer[m]

with the client credentials grant, you won't need the manual /oauth2/authorization.

2025-06-03 15433, 2025

12:50 PM
fettuccinae[m]

Ohh, thanks. I'll add todo's and work on other things.

2025-06-03 15451, 2025

12:51 PM
fettuccinae[m]

* Ohh, got it, thanks. I'll, * add todo's for this and work

2025-06-03 15449, 2025

13:31 PM
Kladky has quit

2025-06-03 15429, 2025

13:34 PM
Kladky joined the channel

2025-06-03 15458, 2025

13:41 PM
Kladky has quit

2025-06-03 15436, 2025

13:43 PM
Kladky joined the channel

2025-06-03 15408, 2025

14:14 PM
lucifer[m]

holycow23: i tested the dumps locally and everything seems to work fine, lets try again when you are around.

2025-06-03 15436, 2025

14:14 PM
holycow23[m]

lucifer[m]: I can try right now

2025-06-03 15457, 2025

14:14 PM
lucifer[m]

holycow23: okay try running `./develop.sh spark format` once and share its output.

2025-06-03 15416, 2025

14:15 PM
pite joined the channel

2025-06-03 15456, 2025

14:18 PM
holycow23[m]

Do I share the entire log?

2025-06-03 15407, 2025

14:19 PM
lucifer[m]

the last few lines should be enough

2025-06-03 15424, 2025

14:19 PM
holycow23[m]

https://gist.github.com/granth23/95232d5ed5c0eff4…

2025-06-03 15431, 2025

14:19 PM
holycow23[m]

I have updated it here

2025-06-03 15441, 2025

14:19 PM
lucifer[m]

looks good.

2025-06-03 15453, 2025

14:19 PM
holycow23[m]

Okay

2025-06-03 15400, 2025

14:20 PM
lucifer[m]

now run ./develop.sh up web -d and then ./develop.sh spark up -d

2025-06-03 15438, 2025

14:20 PM
holycow23[m] uploaded an image: (64KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/dRohsmAEdwgpTRGQvgvGwfdz/image.png >

2025-06-03 15459, 2025

14:20 PM
holycow23[m] uploaded an image: (81KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/PfVlTkHjAfqJjpsMQprdMams/image.png >

2025-06-03 15419, 2025

14:21 PM
lucifer[m]

./develop.sh manage spark request_import_incremental

2025-06-03 15433, 2025

14:21 PM
lucifer[m]

./develop.sh manage spark request_import_sample

2025-06-03 15458, 2025

14:21 PM
holycow23[m] uploaded an image: (56KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/fePYeFuAPcQmXyMssZTTvchx/image.png >

2025-06-03 15408, 2025

14:22 PM
lucifer[m]

monitor the logs for the request consumer container and share them when its done executing these commands.

2025-06-03 15404, 2025

14:23 PM
holycow23[m]

https://gist.github.com/granth23/95232d5ed5c0eff4…

2025-06-03 15406, 2025

14:23 PM
holycow23[m]

Updated here

2025-06-03 15435, 2025

14:23 PM
lucifer[m]

incremental dump imported fine, its still importing the sample dump.

2025-06-03 15449, 2025

14:23 PM
lucifer[m]

should be done in less than 5 mins.

2025-06-03 15407, 2025

14:24 PM
holycow23[m]

Okay

2025-06-03 15419, 2025

14:24 PM
lucifer[m]

update the logs when you see another Request done!

2025-06-03 15437, 2025

14:24 PM
holycow23[m]

Updated

2025-06-03 15446, 2025

14:24 PM
holycow23[m]

Got a request done!

2025-06-03 15452, 2025

14:24 PM
lucifer[m]

that succeeded as well.

2025-06-03 15425, 2025

14:25 PM
lucifer[m]

okay now run ./develop.sh manage spark request_user_stats --entity artists --range this_week --type entity

2025-06-03 15443, 2025

14:25 PM
holycow23[m]

Done

2025-06-03 15408, 2025

14:26 PM
lucifer[m]

update the request consumer logs after another request done

2025-06-03 15428, 2025

14:26 PM
holycow23[m]

Okay

2025-06-03 15404, 2025

14:29 PM
holycow23[m]

Will this take time?

2025-06-03 15436, 2025

14:30 PM
lucifer[m]

should be done by now.

2025-06-03 15447, 2025

14:30 PM
lucifer[m]

update the logs anyway and i'll take a look

2025-06-03 15400, 2025

14:32 PM
holycow23[m]

Updated

2025-06-03 15450, 2025

14:32 PM
lucifer[m]

yeah seems to be still running, lets wait. this is not optimized for running locally.

2025-06-03 15401, 2025

14:33 PM
holycow23[m]

okay

2025-06-03 15406, 2025

14:34 PM
lucifer[m]

it took 16s on my PC but docker-desktop is probably slower.

2025-06-03 15427, 2025

14:34 PM
holycow23[m]

Okay

2025-06-03 15432, 2025

14:34 PM
lucifer[m]

anything new in logs

2025-06-03 15439, 2025

14:34 PM
holycow23[m]

Nope

2025-06-03 15409, 2025

14:35 PM
holycow23[m]

Its been at this stage for long now

2025-06-03 15442, 2025

14:35 PM
lucifer[m]

okay, check spark_reader logs

2025-06-03 15458, 2025

14:35 PM
lucifer[m]

and see if there are any messages for user_entity.

2025-06-03 15424, 2025

14:38 PM
holycow23[m] uploaded an image: (53KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/xHVHOWWKtTxmXkJyFKQqfDFu/image.png >

2025-06-03 15437, 2025

14:39 PM
lucifer[m]

check the logs above and below this, there are a lot of debug messages that might drown the user entity message. you can do a grep on the logs if possible for user_entity to confirm.

2025-06-03 15436, 2025

14:45 PM
holycow23[m]

its been this throughout except these two lines

2025-06-03 15436, 2025

14:45 PM
holycow23[m]

`2025-06-03 14:22:34,058 listenbrainz.webserver DEBUG Received a message, adding to internal processing queue...`

2025-06-03 15436, 2025

14:45 PM
holycow23[m]

`2025-06-03 14:22:34,059 listenbrainz.webserver INFO Received message for import_incremental_dump`

2025-06-03 15443, 2025

14:45 PM
lucifer[m]

i see

2025-06-03 15458, 2025

14:45 PM
lucifer[m]

try running ./develop.sh manage spark request_user_stats --entity artists --range this_week --type entity again i guess

2025-06-03 15410, 2025

14:46 PM
lucifer[m]

and see if there's anything new in request_consumer logs

2025-06-03 15426, 2025

14:47 PM
holycow23[m]

listenbrainzspark hasn't changed after running the command

2025-06-03 15453, 2025

14:47 PM
holycow23[m] uploaded an image: (32KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/QvxGdpfqwgXBUfjmeXvjcUeZ/image.png >

2025-06-03 15446, 2025

14:48 PM
lucifer[m]

yeah that is fine, what about the request consumer logs

2025-06-03 15402, 2025

14:49 PM
holycow23[m]

its the same

2025-06-03 15403, 2025

14:49 PM
holycow23[m]

no updated

2025-06-03 15447, 2025

14:49 PM
lucifer[m]

i see, you can stop the containers.

2025-06-03 15452, 2025

14:49 PM
lucifer[m]

./develop.sh spark down

2025-06-03 15408, 2025

14:50 PM
lucifer[m]

and then run ./develop.sh spark up to bring it back up again

2025-06-03 15420, 2025

14:50 PM
lucifer[m]

./develop.sh manage spark request_user_stats --entity recordings --range this_week --type entity

2025-06-03 15430, 2025

14:50 PM
lucifer[m]

then run recording stats instead.

2025-06-03 15448, 2025

14:51 PM
holycow23[m] uploaded an image: (130KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/pfpKdaqrTcgkGUGRHSZpNxfr/image.png >

2025-06-03 15453, 2025

14:51 PM
holycow23[m]

I ran recordings only and received this

2025-06-03 15411, 2025

14:52 PM
lucifer[m]

./develop.sh manage spark request_user_stats --entity release_groups --range this_week --type entity

2025-06-03 15416, 2025

14:52 PM
lucifer[m]

try release groups.

2025-06-03 15448, 2025

14:52 PM
holycow23[m]

Okay

2025-06-03 15401, 2025

14:53 PM
holycow23[m]

I wait now right?

2025-06-03 15435, 2025

14:53 PM
lucifer[m]

yes, what do the logs show

2025-06-03 15401, 2025

14:54 PM
holycow23[m]

request consumer?

2025-06-03 15435, 2025

14:54 PM
lucifer[m]

yes

2025-06-03 15436, 2025

14:54 PM
holycow23[m]

https://gist.github.com/granth23/95232d5ed5c0eff4…

2025-06-03 15441, 2025

14:54 PM
holycow23[m]

updated current status

2025-06-03 15421, 2025

14:55 PM
lucifer[m]

i see. okay.

2025-06-03 15445, 2025

14:55 PM
lucifer[m]

./develop.sh manage spark request_user_stats --type listening_activity --range this_week