#metabrainz

/

      • d4rkie joined the channel
      • d4rk-ph0enix has quit
      • Jigen joined the channel
      • Goemon has quit
      • ApeKattQuest has quit
      • ApeKattQuest joined the channel
      • ApeKattQuest has quit
      • ApeKattQuest joined the channel
      • dabeglavins has quit
      • pite has quit
      • lucifer[m]
        rayyan_seliya123: let's start with just 78rpm/cylinder for now. you should update your prototype or rewrite it from scratch to work with the rest of the codebase. https://github.com/metabrainz/listenbrainz-serv...
      • you won't need to create new models, we don't use sqlalchemy as an orm anyway.
      • you can see apple and spotify follow the same structure whereas soundcloud has a different one. you should check what data is available in the IA and then either map it to the existing apple/spotify or soundcloud format. if neither is suitable then we can think of a new format.
      • Maxr1998_ has quit
      • Maxr1998 joined the channel
      • rayyan_seliya123
        <lucifer[m]> "you can see apple and spotify..." <- Thanks for the detailed guidance! I’ve reviewed the existing codebase and understand that for the 78rpm/cylinder collections, the Internet Archive data is mostly track-level, so I’ll map it to the SoundCloud format as you suggested.
      • For moving forward, would you prefer that I work directly in the main ListenBrainz repo through PRs, or should I start in a separate branch and then merge my work in? I want to follow whatever workflow you think is best for the project.
      • Let me know what you prefer, and I’ll get started accordingly!
      • lucifer[m]
        [@rayyan_seliya123:matrix.org](https://matrix.to/#/@rayyan_seliya123:matrix.org) work with LB repo through PRs.
      • rayyan_seliya123
        lucifer[m]: Okk fine 👍
      • _BrainzGit
        [listenbrainz-server] 14amCap1712 opened pull request #3292 (03master…similar-users): Use cosine similarity instead of pearson coefficient for similar users https://github.com/metabrainz/listenbrainz-serv...
      • lucifer[m]
        monkey: the current similarity scores on LB should be using this new algorithm
      • monkey[m]
        Ooh, OK
      • lucifer[m]
        do they seem sensible to you?
      • mayhem[m] is reading the PR right now
      • i have a dump of the score before this change if you want to compare.
      • monkey[m]
        Damn, I don't have older version saved to compare, but let me look
      • holycow23[m]
        <lucifer[m]> "i'll fix the errors and let..." <- Hey lucifer, any update on this
      • lucifer[m]
        holycow23: not yet
      • holycow23[m]
        Okay
      • mayhem[m]
        lucifer: its hard to judge the cosine similarity without having prior data.
      • monkey[m]
        Would love to compare to see if it was the case before, but I'm already seeing twousers whom I have 6+ artists in common at 0% compatibility, which feels wrong.
      • But I've always thought the similarity scores were low
      • lucifer[m] sent a code block: https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/yhckSFyYztsIjnTgUrAmpKDY
      • _BrainzGit
        [musicbrainz-server] 14reosarevok opened pull request #3552 (03master…MBS-14047): MBS-14047: Support medium in NotFound https://github.com/metabrainz/musicbrainz-serve...
      • BrainzBot
        MBS-14047: ISE when trying to reach non-existing medium MBID https://tickets.metabrainz.org/browse/MBS-14047
      • lucifer[m]
        monkey: ^
      • mayhem[m]
        I noticed that the closest person to me is now much stronger, while the others are weaker.
      • lucifer[m] sent a code block: https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/txxvtmQvDEICRXZHaGRmpFeD
      • lucifer[m]
        the first row is pearson coefficient and the second row is cosine similarity
      • monkey[m]
        Well, they seem very close
      • mayhem[m]
        oh wow. well, I guess I haven't looked at similarity data in a while.
      • lucifer[m]
        mayhem: user similarities have not updated in a few days because it always OOM'ed.
      • the last week it OOM'ed in a way to bring down the cluster so i changed it.
      • monkey[m]
        FWIW i think the similarity calculations need to be reviewed, but where it comes to fixing OOM and the smallest differences between the numbers I see, I would consider them equivalent.
      • lucifer[m]
        we can implement and experiment with pearson coefficient but just that we'd have to implement something manually. which is doable.
      • i went with column similarities because it exists there and was a smaller fix.
      • mayhem[m]
        I think we should keep it for the time being and ask the community for feedback.
      • monkey[m]
        Might be worth calculating the average difference between the two methods for all the usersyou have data for, but... From my point of view they are both equally low.
      • mayhem[m]
        that downside to that is that everyone has an opinion on how it should work and there'd be "its just a little tweak" comments.
      • (in ML, its never just a little tweak.)
      • monkey[m]
        Little tweak, big refactor
      • lucifer[m]
        fwiw, i don't recall any particular reason implementing it with pearson coefficient the first time.
      • i do think there is value in experimenting and improving similarities but we'd need to do it more rigourously, define proper test datasets as a reference etc etc
      • monkey[m]
        Agreed.
      • For my numbers the differences were sub-percentage point, which makes virtually no difference, so OK from me.
      • _BrainzGit
        [listenbrainz-server] 14amCap1712 merged pull request #3292 (03master…similar-users): Use cosine similarity instead of pearson coefficient for similar users https://github.com/metabrainz/listenbrainz-serv...
      • fettuccinae[m]
        mayhem: ping
      • mayhem[m]
        Pong
      • fettuccinae[m]
        For authroziation of endpoints, each project can have an auth token generated from MeB and saved in secrets of both MeB and the project.
      • That way, when a project makes a request, we can authorize it using either the token or the owner_id of the token sent. Is this approach okay?
      • mayhem[m]
        I think so, but lucifer: is more on top of oath related questions. lucifer: ?
      • lucifer[m]
        @fettuccinae:matrix.org: not sure what you mean. but the workflow would be as follows: the project LB/BB/MB connect to MeB to obtain an auth token, and use that auth token in the request to post notifications to MeB, MeB validates whether the token has the relevant scopes and is owned by the one of the hardcoded client ids in the configuration, if yes then it proceeds otherwise it rejects the request.
      • fettuccinae[m]
        lucifer[m]: i was thinking an admin user could generate auth token for projects through https://metabrainz.org/profile#, and then this token could be hardcoded in the configuration of both project and Meb. So when a project sends a request with this token, MeB verifies it against the saved token in config and allows the request
      • lucifer[m]
        @fettuccinae:matrix.org: no we don't want to do that for multiple reasons. it makes token rotation hard and we cannot have expiring tokens this way.
      • fettuccinae[m]
        ohh, but how can the project get tokens in the authorization for if login is required for /oauth2/authorize.
      • lucifer[m]
        unless there is a strong reason we should stick to using the oauth way. i am running a bit behind schedule on client credentials grant but only the testing is pending, once that is done it should be available for use in your project.
      • fettuccinae[m]
        s/for/flow/
      • lucifer[m]
        with the client credentials grant, you won't need the manual /oauth2/authorization.
      • fettuccinae[m]
        Ohh, thanks. I'll add todo's and work on other things.
      • * Ohh, got it, thanks. I'll, * add todo's for this and work
      • Kladky has quit
      • Kladky joined the channel
      • Kladky has quit
      • Kladky joined the channel
      • lucifer[m]
        holycow23: i tested the dumps locally and everything seems to work fine, lets try again when you are around.
      • holycow23[m]
        lucifer[m]: I can try right now
      • lucifer[m]
        holycow23: okay try running `./develop.sh spark format` once and share its output.
      • pite joined the channel
      • holycow23[m]
        Do I share the entire log?
      • lucifer[m]
        the last few lines should be enough
      • holycow23[m]
      • I have updated it here
      • lucifer[m]
        looks good.
      • holycow23[m]
        Okay
      • lucifer[m]
        now run ./develop.sh up web -d and then ./develop.sh spark up -d
      • holycow23[m] uploaded an image: (64KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/dRohsmAEdwgpTRGQvgvGwfdz/image.png >
      • holycow23[m] uploaded an image: (81KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/PfVlTkHjAfqJjpsMQprdMams/image.png >
      • ./develop.sh manage spark request_import_incremental
      • ./develop.sh manage spark request_import_sample
      • holycow23[m] uploaded an image: (56KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/fePYeFuAPcQmXyMssZTTvchx/image.png >
      • monitor the logs for the request consumer container and share them when its done executing these commands.
      • holycow23[m]
      • Updated here
      • lucifer[m]
        incremental dump imported fine, its still importing the sample dump.
      • should be done in less than 5 mins.
      • holycow23[m]
        Okay
      • lucifer[m]
        update the logs when you see another Request done!
      • holycow23[m]
        Updated
      • Got a request done!
      • lucifer[m]
        that succeeded as well.
      • okay now run ./develop.sh manage spark request_user_stats --entity artists --range this_week --type entity
      • holycow23[m]
        Done
      • lucifer[m]
        update the request consumer logs after another request done
      • holycow23[m]
        Okay
      • Will this take time?
      • lucifer[m]
        should be done by now.
      • update the logs anyway and i'll take a look
      • holycow23[m]
        Updated
      • lucifer[m]
        yeah seems to be still running, lets wait. this is not optimized for running locally.
      • holycow23[m]
        okay
      • lucifer[m]
        it took 16s on my PC but docker-desktop is probably slower.
      • holycow23[m]
        Okay
      • lucifer[m]
        anything new in logs
      • holycow23[m]
        Nope
      • Its been at this stage for long now
      • lucifer[m]
        okay, check spark_reader logs
      • and see if there are any messages for user_entity.
      • holycow23[m] uploaded an image: (53KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/xHVHOWWKtTxmXkJyFKQqfDFu/image.png >
      • check the logs above and below this, there are a lot of debug messages that might drown the user entity message. you can do a grep on the logs if possible for user_entity to confirm.
      • holycow23[m]
        its been this throughout except these two lines
      • `2025-06-03 14:22:34,058 listenbrainz.webserver DEBUG Received a message, adding to internal processing queue...`
      • `2025-06-03 14:22:34,059 listenbrainz.webserver INFO Received message for import_incremental_dump`
      • lucifer[m]
        i see
      • try running ./develop.sh manage spark request_user_stats --entity artists --range this_week --type entity again i guess
      • and see if there's anything new in request_consumer logs
      • holycow23[m]
        listenbrainzspark hasn't changed after running the command
      • holycow23[m] uploaded an image: (32KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/QvxGdpfqwgXBUfjmeXvjcUeZ/image.png >
      • lucifer[m]
        yeah that is fine, what about the request consumer logs
      • holycow23[m]
        its the same
      • no updated
      • lucifer[m]
        i see, you can stop the containers.
      • ./develop.sh spark down
      • and then run ./develop.sh spark up to bring it back up again
      • ./develop.sh manage spark request_user_stats --entity recordings --range this_week --type entity
      • then run recording stats instead.
      • holycow23[m] uploaded an image: (130KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/pfpKdaqrTcgkGUGRHSZpNxfr/image.png >
      • holycow23[m]
        I ran recordings only and received this
      • lucifer[m]
        ./develop.sh manage spark request_user_stats --entity release_groups --range this_week --type entity
      • try release groups.
      • holycow23[m]
        Okay
      • I wait now right?
      • lucifer[m]
        yes, what do the logs show
      • holycow23[m]
        request consumer?
      • lucifer[m]
        yes
      • holycow23[m]
      • updated current status
      • lucifer[m]
        i see. okay.
      • ./develop.sh manage spark request_user_stats --type listening_activity --range this_week