[@holycow23:matrix.org](https://matrix.to/#/@holycow23:matrix.org) what postgres queries and which metadata files?
2025-06-17 16858, 2025
holycow23[m]
lucifer[m]: Let's say I want to write a postgres query to fetch the listens with their year of release so I need to use the metadata from the HDFS query and for listens I am using the function as defined in the gist
2025-06-17 16842, 2025
lucifer[m]
[@holycow23:matrix.org](https://matrix.to/#/@holycow23:matrix.org) you won't need to write a postgres query, it would be a spark sql query. the distinction is also important because for some things the syntax of spark sql is different from postgres. for the hdfs metadata you can read the dataframe as in the example gist using dataframe api or spark sql query.
2025-06-17 16801, 2025
lucifer[m]
You shouldn't need to check the MB dumps in any case. Any metadata that you need should come from MB db. The format of MB dumps is different from MB DB so if you use the first it would create issues.
2025-06-17 16849, 2025
lucifer[m]
I'll check the release year YIM queries in a while and confirm if all the data you need for that one is available or not.
2025-06-17 16825, 2025
lucifer[m]
I have added release_metadata_cache to the sample dumps already. (Might need to update the codebase/container to import it successfully though).
2025-06-17 16844, 2025
holycow23[m]
<lucifer[m]> "You shouldn't need to check..." <- Where can I see the formt of the MD db?
Alternatively you can connect to the database on wolf using ssh port forwarding and browse the database with your choice of tool.
2025-06-17 16811, 2025
holycow23[m]
But are all the tables present cause I could locate... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/ORYoUUzjdIPkVCSGOpwbkhcv>)
2025-06-17 16827, 2025
holycow23[m]
* But are all the tables present cause I could locate... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/eAKlqXmzsINXGuffdugSZoDe>)
2025-06-17 16828, 2025
lucifer[m]
This is not MB db.
2025-06-17 16841, 2025
lucifer[m]
This is the list of metadata files imported into spark
2025-06-17 16842, 2025
holycow23[m]
sorry this is the metadata part right?
2025-06-17 16858, 2025
holycow23[m]
Yeah my bad
2025-06-17 16804, 2025
holycow23[m]
So, I gotta use this right?
2025-06-17 16816, 2025
holycow23[m]
No need for MB db?
2025-06-17 16818, 2025
lucifer[m]
Yes this data already exists and you can use it as needed
2025-06-17 16800, 2025
lucifer[m]
If there is some metadata that doesn't in these files then you would need to write queries to create these files by reading data from MB db
2025-06-17 16831, 2025
lucifer[m]
I'll update the setup on wolf later today to add release_metadata_cache to the table.
2025-06-17 16848, 2025
lucifer[m]
*to spark.
2025-06-17 16816, 2025
holycow23[m]
lucifer[m]: I didn't get this?
2025-06-17 16805, 2025
lucifer[m]
There is one more metadata file available in production that is missing from your local setup because I only added it to sample dumps last week.
2025-06-17 16812, 2025
saumon has quit
2025-06-17 16828, 2025
lucifer[m]
I'll update your spark setup to add that file.
2025-06-17 16806, 2025
julian45[m]
reosarevok: a while back we talked about the continued need to mass mail auto-editors for election notifications, even in a post SSO implementation future...... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/crTMlpWdnuPshxXUmmatPiCE>)
2025-06-17 16814, 2025
julian45[m]
not urgent, just a few thoughts i had while heading towards bed :)
2025-06-17 16827, 2025
holycow23[m]
lucifer: I wrote this small script... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/XvqjhvqjhQXdGCjwYpPVXWfR>)
2025-06-17 16843, 2025
holycow23[m]
* lucifer: I wrote this small script... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/XTrbfztUyjqtvQoWFQgEQnvO>)
2025-06-17 16843, 2025
dabeglavins60721 joined the channel
2025-06-17 16821, 2025
dabeglavins6072 has quit
2025-06-17 16848, 2025
Kladky joined the channel
2025-06-17 16859, 2025
reosarevok[m]
<julian45[m]> "reosarevok: a while back we..." <- > <@julian45:julian45.net> reosarevok: a while back we talked about the continued need to mass mail auto-editors for election notifications, even in a post SSO implementation future...... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/HaWhhDRVhHGCzzHKhFFTcWCT>)
2025-06-17 16857, 2025
saumon joined the channel
2025-06-17 16849, 2025
Maxr1998 joined the channel
2025-06-17 16843, 2025
Maxr1998_ has quit
2025-06-17 16843, 2025
mayhem[m]
<mayhem[m]> "lucifer: labs.api is running..." <- Did you take a look to see if anything was amiss with the data?
2025-06-17 16837, 2025
dabeglavins60721 has quit
2025-06-17 16800, 2025
lucifer[m]
mayhem: missed that message yesterday, will take a look now.
2025-06-17 16807, 2025
lucifer[m]
<holycow23[m]> "lucifer: I wrote this small..." <- you can assume it will work fine in production without limiting, we have bigger queries that work fine there. do you still run out of memory with --driver-memory 8g?
2025-06-17 16859, 2025
pite_ has quit
2025-06-17 16812, 2025
pite joined the channel
2025-06-17 16805, 2025
holycow23[m]
Yes I did run out of memory
2025-06-17 16845, 2025
mayhem[m] uploaded an image: (23KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/QxFQIEVftryIukGiTwUJvmZA/image.png >
2025-06-17 16850, 2025
mayhem[m]
lucifer: my LB instance is throwing this error on login
2025-06-17 16812, 2025
mayhem[m]
keys verified, so without an error message, I am unsure how to proceed.
<lucifer[m]> "you can assume it will work fine..." <- I did run out of memory, also does such type of querying work or do I need SQL queries, that's what I wrote in mock queries in the proposal so either I will have to use that or just pandas filtering
2025-06-17 16814, 2025
lucifer[m]
holycow23: that type of querying works but i think for consistency sake best to use SQL queries only.
2025-06-17 16840, 2025
holycow23[m]
lucifer[m]: Okay that's what I thought of too but how do I test those?
2025-06-17 16804, 2025
lucifer[m]
you can those by passing the query to spark.sql(query)
for running out of memory, i'll take a look at it, there are different kinds of memory configurations in spark and its possible another one needs to be increased to avoid the issue.
2025-06-17 16847, 2025
mayhem[m]
<lucifer[m]> "to confirm the OAUTH_CLIENT_ID..." <- yes, both match
2025-06-17 16839, 2025
lucifer[m]
i'll try to reproduce the issue an fix it
2025-06-17 16813, 2025
mayhem[m]
let me know if you need help.
2025-06-17 16824, 2025
holycow23[m] sent a from code block: https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/NwWeXkNYsOPcWVKEASzafweS
2025-06-17 16845, 2025
holycow23[m]
This script ran quite well to map the songs with the genre
2025-06-17 16800, 2025
outsidecontext[m
reosarevok: is the tagger link fix for taglookup supposed to be deployed on beta? Because I still get the issue there
It no longer opens a localhost tab for me at least...
2025-06-17 16852, 2025
reosarevok[m]
(and I get the same error on console than on search)
2025-06-17 16856, 2025
reosarevok[m] sent a code block: https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/UWGxJbpzaowymWnFeQTNlXwt
2025-06-17 16849, 2025
outsidecontext[m
it does for me (well, it is the same tab for me, but it navigates to http://127.0.0.1:8001)
2025-06-17 16824, 2025
outsidecontext[m
ok, sorry. was a cache issue. cleared the cache and now it works
2025-06-17 16828, 2025
holycow23[m]
<lucifer[m]> "for running out of memory, i'..." <- I just wrote a query for count of listens per genre grouped by user, that worked well without any limits
<lucifer[m]> "rayyan_seliya123, suvid, m...." <- hey lucifer gentle reminder can u please review this commit https://github.com/metabrainz/listenbrainz-server… as we discussed tp get move ahead whats pending or something !!
this will be used to execute your query and generate the results.
2025-06-17 16819, 2025
holycow23[m]
Okay will look into it
2025-06-17 16837, 2025
mayhem[m]
lucifer: was this dataset processed withe the Beatles fix in place?
2025-06-17 16845, 2025
lucifer[m]
mayhem: nope.
2025-06-17 16800, 2025
holycow23[m]
lucifer[m]: Actually I did go through this in the early days but how do I test this?
2025-06-17 16836, 2025
lucifer[m]
mayhem: i don't have the link to those video recordings with lfm guys. can you share it again?
2025-06-17 16856, 2025
mayhem[m]
lucifer[m]: the data looks really nice, from the spot checks I've made. but artists like the beatles are featuring quite prominently in some results. so I would very much love to see this fixed for all of our similar data sets.
2025-06-17 16818, 2025
lucifer[m]
fwiw, it might not be easily applicable here as to my best recollection their suggestion was to scale items in the collaborative filtering model.
2025-06-17 16843, 2025
mayhem[m]
ah, yes. ok, in that case, I think this is workable for the start. I can't see any problems from my spot checking, but eventually others might. so, lets keep our ability to regenerate this data alive for the time being.
holycow23: above are the changes needed to add a new stat to spark.
2025-06-17 16838, 2025
holycow23[m]
Okay will go through them thanks
2025-06-17 16846, 2025
lucifer[m]
once all of this is in place, you will be able to run the command created in step 5 to send a request to the spark cluster (like we do for requesting existing stats or creating a new dump)
2025-06-17 16825, 2025
lucifer[m]
for testing purposes, when your class in step 1 is ready, you can import it in pyspark and run it directly.
2025-06-17 16839, 2025
lucifer[m]
the code will be similar to the function linked in step 2.
2025-06-17 16817, 2025
lucifer[m]
when you have written step 2, you can just import that function and call it with the desired arguments and test 1 and 2 together so on.
2025-06-17 16813, 2025
holycow23[m]
Okay, will go through them and if anything will get back toyou
2025-06-17 16840, 2025
lucifer[m] posted a file: Debugging.ipynb (3703KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/ulXxuIqCvyTUGaSAkIwhAMel >
2025-06-17 16819, 2025
lucifer[m]
this is a notebook that i use for similar testing and debugging of the spark cluster. i'll try to clean it up later and share with you. but for now you can see the raw version and if it helps.
rayyan_seliya123: you can combine the changes from both PRs into a single one and close the other one.
2025-06-17 16802, 2025
rayyan_seliya123
lucifer[m]: The one in which I have added sql files or tables ??I should close this ? And merge it into the one in which I have the seeder file and indexer script ?
2025-06-17 16814, 2025
lucifer[m]
sure sounds good to me
2025-06-17 16810, 2025
rayyan_seliya123
lucifer[m]: Okk will do it and do let me know after that review to move further
2025-06-17 16807, 2025
lucifer[m]
suvid: i took a look at your PR, there are some changes needed but i don't see any blockers so you can start working on implementing the processing of zip for imports.
2025-06-17 16837, 2025
lucifer[m]
also, tests will be needed for every view including file uploads.
2025-06-17 16854, 2025
lusciouslover has quit
2025-06-17 16833, 2025
lusciouslover joined the channel
2025-06-17 16819, 2025
BobSwift[m] joined the channel
2025-06-17 16820, 2025
BobSwift[m]
<reosarevok[m]> "> <@julian45:julian45.net..." <- And the mailing tool I put together was a simple hack to help support that stopgap measure. It was never intended to be a full-on production type mass mailer.
2025-06-17 16832, 2025
reosarevok[m]
Tons of thanks for that, by the way! :)
2025-06-17 16822, 2025
mayhem[m]
lucifer: we had a user discover that playlists that belong to a deleted LB user gave a 500 error when trying to load those pages. I've made a deleted_lb_user that all deleted playlists are ascribed to. like this: