#metabrainz

/

0:37 AM
ApeKattQuest joined the channel

2025-06-17 16822, 2025

0:41 AM
lusciouslover has quit

2025-06-17 16843, 2025

0:41 AM
lusciouslover joined the channel

2025-06-17 16815, 2025

0:42 AM
Jigen has quit

2025-06-17 16832, 2025

1:27 AM
Jigen joined the channel

2025-06-17 16810, 2025

1:28 AM
Goemon joined the channel

2025-06-17 16836, 2025

1:29 AM
ApeKattQuest has quit

2025-06-17 16815, 2025

1:31 AM
HSOWA joined the channel

2025-06-17 16816, 2025

1:31 AM
HSOWA has quit

2025-06-17 16816, 2025

1:31 AM
HSOWA joined the channel

2025-06-17 16814, 2025

1:32 AM
Jigen has quit

2025-06-17 16855, 2025

2:41 AM
rozlav8 has quit

2025-06-17 16801, 2025

2:42 AM
rozlav82 joined the channel

2025-06-17 16827, 2025

4:01 AM
lucifer[m]

[@holycow23:matrix.org](https://matrix.to/#/@holycow23:matrix.org) what postgres queries and which metadata files?

2025-06-17 16858, 2025

4:02 AM
holycow23[m]

lucifer[m]: Let's say I want to write a postgres query to fetch the listens with their year of release so I need to use the metadata from the HDFS query and for listens I am using the function as defined in the gist

2025-06-17 16842, 2025

4:08 AM
lucifer[m]

[@holycow23:matrix.org](https://matrix.to/#/@holycow23:matrix.org) you won't need to write a postgres query, it would be a spark sql query. the distinction is also important because for some things the syntax of spark sql is different from postgres. for the hdfs metadata you can read the dataframe as in the example gist using dataframe api or spark sql query.

2025-06-17 16801, 2025

4:10 AM
lucifer[m]

You shouldn't need to check the MB dumps in any case. Any metadata that you need should come from MB db. The format of MB dumps is different from MB DB so if you use the first it would create issues.

2025-06-17 16849, 2025

4:11 AM
lucifer[m]

I'll check the release year YIM queries in a while and confirm if all the data you need for that one is available or not.

2025-06-17 16825, 2025

4:12 AM
lucifer[m]

I have added release_metadata_cache to the sample dumps already. (Might need to update the codebase/container to import it successfully though).

2025-06-17 16844, 2025

4:15 AM
holycow23[m]

<lucifer[m]> "You shouldn't need to check..." <- Where can I see the formt of the MD db?

2025-06-17 16814, 2025

4:17 AM
lucifer[m]

https://musicbrainz.org/doc/MusicBrainz_Database/…

2025-06-17 16802, 2025

4:18 AM
lucifer[m]

Alternatively you can connect to the database on wolf using ssh port forwarding and browse the database with your choice of tool.

2025-06-17 16811, 2025

4:18 AM
holycow23[m]

But are all the tables present cause I could locate... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/ORYoUUzjdIPkVCSGOpwbkhcv>)

2025-06-17 16827, 2025

4:18 AM
holycow23[m]

* But are all the tables present cause I could locate... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/eAKlqXmzsINXGuffdugSZoDe>)

2025-06-17 16828, 2025

4:18 AM
lucifer[m]

This is not MB db.

2025-06-17 16841, 2025

4:18 AM
lucifer[m]

This is the list of metadata files imported into spark

2025-06-17 16842, 2025

4:18 AM
holycow23[m]

sorry this is the metadata part right?

2025-06-17 16858, 2025

4:18 AM
holycow23[m]

Yeah my bad

2025-06-17 16804, 2025

4:19 AM
holycow23[m]

So, I gotta use this right?

2025-06-17 16816, 2025

4:19 AM
holycow23[m]

No need for MB db?

2025-06-17 16818, 2025

4:19 AM
lucifer[m]

Yes this data already exists and you can use it as needed

2025-06-17 16800, 2025

4:20 AM
lucifer[m]

If there is some metadata that doesn't in these files then you would need to write queries to create these files by reading data from MB db

2025-06-17 16831, 2025

4:20 AM
lucifer[m]

I'll update the setup on wolf later today to add release_metadata_cache to the table.

2025-06-17 16848, 2025

4:20 AM
lucifer[m]

*to spark.

2025-06-17 16816, 2025

4:22 AM
holycow23[m]

lucifer[m]: I didn't get this?

2025-06-17 16805, 2025

4:24 AM
lucifer[m]

There is one more metadata file available in production that is missing from your local setup because I only added it to sample dumps last week.

2025-06-17 16812, 2025

4:24 AM
saumon has quit

2025-06-17 16828, 2025

4:24 AM
lucifer[m]

I'll update your spark setup to add that file.

2025-06-17 16806, 2025

4:37 AM
julian45[m]

reosarevok: a while back we talked about the continued need to mass mail auto-editors for election notifications, even in a post SSO implementation future...... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/crTMlpWdnuPshxXUmmatPiCE>)

2025-06-17 16814, 2025

4:41 AM
julian45[m]

not urgent, just a few thoughts i had while heading towards bed :)

2025-06-17 16827, 2025

5:11 AM
holycow23[m]

lucifer: I wrote this small script... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/XvqjhvqjhQXdGCjwYpPVXWfR>)

2025-06-17 16843, 2025

5:11 AM
holycow23[m]

* lucifer: I wrote this small script... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/XTrbfztUyjqtvQoWFQgEQnvO>)

2025-06-17 16843, 2025

5:14 AM
dabeglavins60721 joined the channel

2025-06-17 16821, 2025

5:18 AM
dabeglavins6072 has quit

2025-06-17 16848, 2025

5:53 AM
Kladky joined the channel

2025-06-17 16859, 2025

6:01 AM
reosarevok[m]

<julian45[m]> "reosarevok: a while back we..." <- > <@julian45:julian45.net> reosarevok: a while back we talked about the continued need to mass mail auto-editors for election notifications, even in a post SSO implementation future...... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/HaWhhDRVhHGCzzHKhFFTcWCT>)

2025-06-17 16857, 2025

7:04 AM
saumon joined the channel

2025-06-17 16849, 2025

9:06 AM
Maxr1998 joined the channel

2025-06-17 16843, 2025

9:07 AM
Maxr1998_ has quit

2025-06-17 16843, 2025

9:30 AM
mayhem[m]

<mayhem[m]> "lucifer: labs.api is running..." <- Did you take a look to see if anything was amiss with the data?

2025-06-17 16837, 2025

9:36 AM
dabeglavins60721 has quit

2025-06-17 16800, 2025

10:03 AM
lucifer[m]

mayhem: missed that message yesterday, will take a look now.

2025-06-17 16807, 2025

10:04 AM
lucifer[m]

<holycow23[m]> "lucifer: I wrote this small..." <- you can assume it will work fine in production without limiting, we have bigger queries that work fine there. do you still run out of memory with --driver-memory 8g?

2025-06-17 16859, 2025

10:11 AM
pite_ has quit

2025-06-17 16812, 2025

10:12 AM
pite joined the channel

2025-06-17 16805, 2025

12:01 PM
holycow23[m]

Yes I did run out of memory

2025-06-17 16845, 2025

12:04 PM
mayhem[m] uploaded an image: (23KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/QxFQIEVftryIukGiTwUJvmZA/image.png >

2025-06-17 16850, 2025

12:04 PM
mayhem[m]

lucifer: my LB instance is throwing this error on login

2025-06-17 16812, 2025

12:05 PM
mayhem[m]

keys verified, so without an error message, I am unsure how to proceed.

2025-06-17 16859, 2025

12:08 PM
lucifer[m]

mayhem: client id as well?

2025-06-17 16812, 2025

12:10 PM
lucifer[m]

to confirm the OAUTH_CLIENT_ID and OAUTH_CLIENT_SECRET in your config, match the client on https://musicbrainz.org/new-oauth2/client/list ?

2025-06-17 16834, 2025

12:12 PM
holycow23[m]

<lucifer[m]> "you can assume it will work fine..." <- I did run out of memory, also does such type of querying work or do I need SQL queries, that's what I wrote in mock queries in the proposal so either I will have to use that or just pandas filtering

2025-06-17 16814, 2025

12:13 PM
lucifer[m]

holycow23: that type of querying works but i think for consistency sake best to use SQL queries only.

2025-06-17 16840, 2025

12:13 PM
holycow23[m]

lucifer[m]: Okay that's what I thought of too but how do I test those?

2025-06-17 16804, 2025

12:14 PM
lucifer[m]

you can those by passing the query to spark.sql(query)

2025-06-17 16813, 2025

12:14 PM
lucifer[m]

that returns a dataframe.

2025-06-17 16839, 2025

12:14 PM
lucifer[m]

example: https://gist.github.com/amCap1712/ecef51789766c9b…

2025-06-17 16836, 2025

12:17 PM
lucifer[m]

for running out of memory, i'll take a look at it, there are different kinds of memory configurations in spark and its possible another one needs to be increased to avoid the issue.

2025-06-17 16847, 2025

12:19 PM
mayhem[m]

<lucifer[m]> "to confirm the OAUTH_CLIENT_ID..." <- yes, both match

2025-06-17 16839, 2025

12:20 PM
lucifer[m]

i'll try to reproduce the issue an fix it

2025-06-17 16813, 2025

12:21 PM
mayhem[m]

let me know if you need help.

2025-06-17 16824, 2025

12:21 PM
holycow23[m] sent a from code block: https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/NwWeXkNYsOPcWVKEASzafweS

2025-06-17 16845, 2025

12:21 PM
holycow23[m]

This script ran quite well to map the songs with the genre

2025-06-17 16800, 2025

12:24 PM
outsidecontext[m

reosarevok: is the tagger link fix for taglookup supposed to be deployed on beta? Because I still get the issue there

2025-06-17 16834, 2025

12:24 PM
pite has quit

2025-06-17 16850, 2025

12:24 PM
pite joined the channel

2025-06-17 16823, 2025

12:25 PM
reosarevok[m]

Hmm. I think so? Let me double check

2025-06-17 16828, 2025

12:26 PM
outsidecontext[m

clicking on any tagger link on https://beta.musicbrainz.org/taglookup/index?tag-… still makes the browser navigate and not a xhr request

2025-06-17 16841, 2025

12:26 PM
reosarevok[m]

Does it?

2025-06-17 16853, 2025

12:26 PM
reosarevok[m]

It no longer opens a localhost tab for me at least...

2025-06-17 16852, 2025

12:27 PM
reosarevok[m]

(and I get the same error on console than on search)

2025-06-17 16856, 2025

12:27 PM
reosarevok[m] sent a code block: https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/UWGxJbpzaowymWnFeQTNlXwt

2025-06-17 16849, 2025

12:28 PM
outsidecontext[m

it does for me (well, it is the same tab for me, but it navigates to http://127.0.0.1:8001)

2025-06-17 16824, 2025

12:29 PM
outsidecontext[m

ok, sorry. was a cache issue. cleared the cache and now it works

2025-06-17 16828, 2025

12:30 PM
holycow23[m]

<lucifer[m]> "for running out of memory, i'..." <- I just wrote a query for count of listens per genre grouped by user, that worked well without any limits

2025-06-17 16839, 2025

12:31 PM
lucifer[m]

cool sounds good.

2025-06-17 16816, 2025

12:33 PM
ijc has quit

2025-06-17 16830, 2025

12:33 PM
ijc joined the channel

2025-06-17 16812, 2025

12:38 PM
fettuccinae[m]

hey lucifer Can you please review this (pr) [https://github.com/metabrainz/metabrainz.org/pull…

2025-06-17 16827, 2025

12:38 PM
fettuccinae[m]

* hey lucifer Can you please review this pr https://github.com/metabrainz/metabrainz.org/pull…

2025-06-17 16833, 2025

12:40 PM
rayyan_seliya123

<lucifer[m]> "rayyan_seliya123, suvid, m...." <- hey lucifer gentle reminder can u please review this commit https://github.com/metabrainz/listenbrainz-server… as we discussed tp get move ahead whats pending or something !!

2025-06-17 16822, 2025

12:41 PM
rayyan_seliya123

* hey

2025-06-17 16823, 2025

12:41 PM
rayyan_seliya123

lucifer gentle reminder can u please review this commit https://github.com/metabrainz/listenbrainz-server… as we discussed to get move ahead whats pending or something !!

2025-06-17 16805, 2025

12:44 PM
holycow23[m]

<lucifer[m]> "cool sounds good." <- Would it be possible for you to give me suggestions on how to move forward after this?

2025-06-17 16828, 2025

12:44 PM
holycow23[m]

Since its quite prod based with base table, aggregate table and cron jobs

2025-06-17 16805, 2025

12:47 PM
lucifer[m]

mayhem: similar recordings should load faster now.

2025-06-17 16851, 2025

12:47 PM
lucifer[m]

holycow23: is the query for your stat ready?

2025-06-17 16801, 2025

12:48 PM
lucifer[m]

rayyan_seliya123: will do

2025-06-17 16806, 2025

12:48 PM
holycow23[m]

yeah

2025-06-17 16820, 2025

12:48 PM
lucifer[m]

@fettuccinae:matrix.org: yes i had reviewed it earlier today, forgot to approve. done now

2025-06-17 16824, 2025

12:48 PM
rayyan_seliya123

lucifer[m]: Okk 👍

2025-06-17 16841, 2025

12:48 PM
holycow23[m] uploaded an image: (28KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/RvmjipvvbPoocOZxGaPHmoLi/image.png >

2025-06-17 16844, 2025

12:48 PM
fettuccinae[m]

lucifer[m]: Thanks.

2025-06-17 16800, 2025

12:49 PM
holycow23[m]

The output is exactly what we need for the genre activity chart

2025-06-17 16809, 2025

12:50 PM
lucifer[m]

cool, take a look at: https://github.com/metabrainz/listenbrainz-server… and create a similar class for your stat.

2025-06-17 16812, 2025

12:51 PM
lucifer[m]

this will be used to execute your query and generate the results.

2025-06-17 16819, 2025

12:52 PM
holycow23[m]

Okay will look into it

2025-06-17 16837, 2025

12:52 PM
mayhem[m]

lucifer: was this dataset processed withe the Beatles fix in place?

2025-06-17 16845, 2025

12:52 PM
lucifer[m]

mayhem: nope.

2025-06-17 16800, 2025

12:53 PM
holycow23[m]

lucifer[m]: Actually I did go through this in the early days but how do I test this?

2025-06-17 16836, 2025

12:53 PM
lucifer[m]

mayhem: i don't have the link to those video recordings with lfm guys. can you share it again?

2025-06-17 16856, 2025

12:53 PM
mayhem[m]

lucifer[m]: the data looks really nice, from the spot checks I've made. but artists like the beatles are featuring quite prominently in some results. so I would very much love to see this fixed for all of our similar data sets.

2025-06-17 16818, 2025

12:54 PM
lucifer[m]

fwiw, it might not be easily applicable here as to my best recollection their suggestion was to scale items in the collaborative filtering model.

2025-06-17 16843, 2025

12:55 PM
mayhem[m]

ah, yes. ok, in that case, I think this is workable for the start. I can't see any problems from my spot checking, but eventually others might. so, lets keep our ability to regenerate this data alive for the time being.

2025-06-17 16817, 2025

12:59 PM
lucifer[m]

1. Stat Implementation: https://github.com/metabrainz/listenbrainz-server…... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/tOQfQwGxkOrTToBhXDOLmNAN>)

2025-06-17 16846, 2025

12:59 PM
lucifer[m]

holycow23: above are the changes needed to add a new stat to spark.

2025-06-17 16838, 2025

13:00 PM
holycow23[m]

Okay will go through them thanks

2025-06-17 16846, 2025

13:00 PM
lucifer[m]

once all of this is in place, you will be able to run the command created in step 5 to send a request to the spark cluster (like we do for requesting existing stats or creating a new dump)

2025-06-17 16825, 2025

13:01 PM
lucifer[m]

for testing purposes, when your class in step 1 is ready, you can import it in pyspark and run it directly.

2025-06-17 16839, 2025

13:01 PM
lucifer[m]

the code will be similar to the function linked in step 2.

2025-06-17 16817, 2025

13:02 PM
lucifer[m]

when you have written step 2, you can just import that function and call it with the desired arguments and test 1 and 2 together so on.

2025-06-17 16813, 2025

13:03 PM
holycow23[m]

Okay, will go through them and if anything will get back toyou

2025-06-17 16840, 2025

13:04 PM
lucifer[m] posted a file: Debugging.ipynb (3703KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/ulXxuIqCvyTUGaSAkIwhAMel >

2025-06-17 16819, 2025

13:05 PM
lucifer[m]

this is a notebook that i use for similar testing and debugging of the spark cluster. i'll try to clean it up later and share with you. but for now you can see the raw version and if it helps.

2025-06-17 16825, 2025

13:06 PM
holycow23[m]

okay

2025-06-17 16837, 2025

13:08 PM
_BrainzGit

[listenbrainz-server] 14amCap1712 opened pull request #3302 (03master…mlhd-labs-api): Add mlhd similar recordings to labs api https://github.com/metabrainz/listenbrainz-server…

2025-06-17 16844, 2025

13:09 PM
lucifer[m]

monkey: hi! let me know if you can take a look at https://github.com/metabrainz/listenbrainz-server… ? i tested it on test.lb and it seems to work fine fwiw.

2025-06-17 16830, 2025

13:10 PM
lucifer[m]

rayyan_seliya123: you can combine the changes from both PRs into a single one and close the other one.

2025-06-17 16802, 2025

13:13 PM
rayyan_seliya123

lucifer[m]: The one in which I have added sql files or tables ??I should close this ? And merge it into the one in which I have the seeder file and indexer script ?

2025-06-17 16814, 2025

13:13 PM
lucifer[m]

sure sounds good to me

2025-06-17 16810, 2025

13:15 PM
rayyan_seliya123

lucifer[m]: Okk will do it and do let me know after that review to move further

2025-06-17 16807, 2025

13:19 PM
lucifer[m]

suvid: i took a look at your PR, there are some changes needed but i don't see any blockers so you can start working on implementing the processing of zip for imports.

2025-06-17 16837, 2025

13:19 PM
lucifer[m]

also, tests will be needed for every view including file uploads.

2025-06-17 16854, 2025

13:49 PM
lusciouslover has quit

2025-06-17 16833, 2025

13:51 PM
lusciouslover joined the channel

2025-06-17 16819, 2025

14:00 PM
BobSwift[m] joined the channel

2025-06-17 16820, 2025

14:00 PM
BobSwift[m]

<reosarevok[m]> "> <@julian45:julian45.net..." <- And the mailing tool I put together was a simple hack to help support that stopgap measure. It was never intended to be a full-on production type mass mailer.

2025-06-17 16832, 2025

14:00 PM
reosarevok[m]

Tons of thanks for that, by the way! :)

2025-06-17 16822, 2025

14:09 PM
mayhem[m]

lucifer: we had a user discover that playlists that belong to a deleted LB user gave a 500 error when trying to load those pages. I've made a deleted_lb_user that all deleted playlists are ascribed to. like this:

2025-06-17 16828, 2025

14:09 PM
mayhem[m]

https://test.listenbrainz.org/playlist/f70d6344-2…

2025-06-17 16829, 2025

14:10 PM
_BrainzGit

[listenbrainz-server] 14mayhem opened pull request #3303 (03master…fix-playlist-500): Fix playlist 500 https://github.com/metabrainz/listenbrainz-server…

2025-06-17 16831, 2025

14:10 PM
lucifer[m]

mayhem: i see makes sense, should we even keep those playlists ? i think its a mistake to not delete them.

2025-06-17 16849, 2025

14:10 PM
lucifer[m]

but this user would still be useful playlist collaborations.

2025-06-17 16801, 2025

14:11 PM
mayhem[m]

that was intentional. I really wanted to keep playlists for various reasons.

2025-06-17 16808, 2025

14:11 PM
mayhem[m]

like mining them for more insights.

2025-06-17 16821, 2025

14:11 PM
lucifer[m]

okay sounds good

2025-06-17 16826, 2025

14:12 PM
lucifer[m]

that page still fails to load for me because of an invalid video id error but that might be something else for monkey to take a look at

2025-06-17 16800, 2025

14:18 PM
_BrainzGit

[listenbrainz-server] 14mayhem merged pull request #3303 (03master…fix-playlist-500): Fix playlist 500 https://github.com/metabrainz/listenbrainz-server…

2025-06-17 16809, 2025

14:25 PM
_BrainzGit

[musicbrainz-server] 14reosarevok opened pull request #3575 (03master…MBS-14065): MBS-14065: Show artist credits in release editor duplicates tab https://github.com/metabrainz/musicbrainz-server/…

2025-06-17 16811, 2025

14:25 PM
BrainzBot

MBS-14065: Show artist credits in release editor duplicates tab https://tickets.metabrainz.org/browse/MBS-14065