ok i switched to token based auth like in submit-listens as of now for testing
2025-06-23 17421, 2025
suvid[m]
thanks @fettuccinae:matrix.org for helping me out!
2025-06-23 17427, 2025
suvid[m]
* like in `submit-listens, * submit-listens` as
2025-06-23 17433, 2025
suvid[m]
* like in `submit-listens, * submit-listens`, * endpoint as of
2025-06-23 17451, 2025
suvid[m]
Now while the auth works and I am able to test the endpoint, but when I am using current_user, it doesnt work as it considers it anonymous user and has no id attribute
2025-06-23 17451, 2025
suvid[m]
It considers that I'm not logged in even if I authenticate with token
2025-06-23 17451, 2025
suvid[m]
Looks like i'll have to find a easy to login and then test in postman
2025-06-23 17457, 2025
suvid[m]
Can someone pls help me out with it?
2025-06-23 17402, 2025
suvid[m]
lucifer
2025-06-23 17425, 2025
pite has quit
2025-06-23 17412, 2025
Kladky joined the channel
2025-06-23 17457, 2025
lucifer[m]
suvid: add a simple html button and dropdown on any page in your local lb frontend code and use it to test your endpoints.
holycow23: remind me, did you want me to help resolve a particular issue on the genre activity PR or just generally review?
2025-06-23 17449, 2025
holycow23[m] joined the channel
2025-06-23 17449, 2025
holycow23[m]
I have made the PR but not sure about the working of the entire stats, I just followed all the steps you had asked
2025-06-23 17459, 2025
holycow23[m]
* lucifer: I have
2025-06-23 17417, 2025
lucifer[m]
i see, did you try testing it on your wolf setup?
2025-06-23 17427, 2025
holycow23[m]
If you could help me with the steps to test the entire thing on wolf
2025-06-23 17444, 2025
holycow23[m]
No, I haven't tested it right now, could you help me with the steps to start the test
2025-06-23 17410, 2025
lucifer[m]
okay, i'll take a look.
2025-06-23 17438, 2025
lucifer[m]
monkey: hi! do you think there is value in showing the users the original name of the file users upload for a listens import?
2025-06-23 17403, 2025
lucifer[m]
sure.
2025-06-23 17409, 2025
monkey[m] joined the channel
2025-06-23 17409, 2025
monkey[m]
Hi! Yes I think that would potentially be very useful.
2025-06-23 17422, 2025
monkey[m]
Looking at my spotify exports for example there are many files I would need to upload, this would help keep track
2025-06-23 17406, 2025
monkey[m]
Probably also useful to have the timestamps for the first and last listen for that file
2025-06-23 17422, 2025
suvid[m]
lucifer: could you also clarify the schema part for the new table?
2025-06-23 17424, 2025
lucifer[m]
I was thinking we would store only the name of the zip file uploaded.
2025-06-23 17429, 2025
suvid[m]
Lemme find the msg I sent above
2025-06-23 17442, 2025
lucifer[m]
share your current schema
2025-06-23 17449, 2025
suvid[m]
<suvid[m]> "Also, lucifer what should be the..." <- > <@suvid:matrix.org> Also, lucifer what should be the schema for the user_data_import table?... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/FPjlQLOYHoDPzgJLvTKTBfjY>)
2025-06-23 17450, 2025
suvid[m]
lucifer: I am planning to store file path in the schema, so do I also need to store the file name as well explicitly?
So my query has LEFT JOIN genres g ON l.recording_mbid = g.recording_mbid but genre isn't registered and originally I used to run genre_df.createOrReplaceTempView("genres") but not in the main stats so how do I resolve it
2025-06-23 17420, 2025
holycow23[m]
holycow23[m]: Where `genre_df` is ```genre_df = spark.read.parquet(f"{config.HDFS_CLUSTER_URI}/recording_genre")```
2025-06-23 17400, 2025
bitmap[m] joined the channel
2025-06-23 17400, 2025
bitmap[m]
<reosarevok[m]> "bitmap: any idea what could..." <- I tried entering a similar edit on my dev server and noticed a couple things: (1) `edits_pending` is never incremented on the new url and (2) the edit is never associated to the new url in the `edit_url` table. since those are the criteria the RemoveEmpty script cares about, I'm guessing it was removed by that script
2025-06-23 17408, 2025
reosarevok[m]
Oh no.
2025-06-23 17433, 2025
reosarevok[m]
So basically any such edit which is not an autoedit will basically fail like this or?
2025-06-23 17411, 2025
bitmap[m]
yeah, after two days
2025-06-23 17456, 2025
lucifer[m]
holycow23: you can look at how we read other metadata caches in stats and do something similar. alternatively, i can push a commit to your branch implementing that.
2025-06-23 17439, 2025
holycow23[m]
I would want to look into it on my own, if no success then will ask you
2025-06-23 17419, 2025
lucifer[m]
sure take a look at how release_metadata_cache is used in release group stats for one example.
2025-06-23 17421, 2025
holycow23[m]
this get_release_metadata_cache is the main function I am assuming?
2025-06-23 17438, 2025
reosarevok[m]
<bitmap[m]> "yeah, after two days" <- Are you submitting a fix? I assume the fix is to do both the things you said :)
2025-06-23 17401, 2025
bitmap[m]
nope i'm writing the script for MBS-14049 rn, if you want I can take a look at it after
Yeah I was going through this only, but couldn't understand the query that well
2025-06-23 17435, 2025
lucifer[m]
it reads the data from HDFS and caches it in spark. ideally you would need to copy paste the code and just change the dataframe path.
2025-06-23 17429, 2025
lucifer[m]
or just for now, you can do genres_df = read_files_from_HDFS(RECORDING_RECORDING_GENRE_DATAFRAME) then genres_df.createOrReplaceTempView("genres") in your stats code.
2025-06-23 17432, 2025
holycow23[m]
but what about all the columns?
2025-06-23 17442, 2025
holycow23[m]
okay
2025-06-23 17442, 2025
lucifer[m]
which columns?
2025-06-23 17452, 2025
holycow23[m]
lucifer[m]: The query in this file is a little confusing in the first glance so will take time and try to understand what's happening here
2025-06-23 17423, 2025
lucifer[m]
which particular query? can you point to the specific line?
2025-06-23 17407, 2025
lucifer[m]
asking because there is no query in that file.
2025-06-23 17452, 2025
holycow23[m]
oh wait my bad I was looking at get_release_group_metadata_cache_query
2025-06-23 17459, 2025
lucifer[m]
ah well :). check the three specific functions that i shared above.
2025-06-23 17411, 2025
lucifer[m]
* specific functions in the link that i
2025-06-23 17419, 2025
holycow23[m]
yes going through that right now
2025-06-23 17434, 2025
holycow23[m]
parallely have run a test run to see what's the next error 😢
2025-06-23 17410, 2025
lucifer[m]
i think its possible that the test run will fail because the genre data might be not present in the test data.
2025-06-23 17443, 2025
lucifer[m]
my suggestion would be to do a normal run using ./develop.sh that would have the data from sample dumps.
2025-06-23 17449, 2025
holycow23[m]
something like this ./develop.sh manage spark request_user_stats --type entity --entity artists --range this_week
2025-06-23 17454, 2025
lucifer[m]
yes
2025-06-23 17435, 2025
holycow23[m]
Got PathNotFoundException: Path not found: /recording_genre so I am assuming missing data only?
2025-06-23 17414, 2025
lucifer[m]
yup
2025-06-23 17434, 2025
lucifer[m]
do you get that with the ./develop.sh run as well?
2025-06-23 17407, 2025
holycow23[m]
No trying that right now
2025-06-23 17414, 2025
lucifer[m]
okay cool.
2025-06-23 17421, 2025
holycow23[m]
I ran the query not request consumer?
2025-06-23 17440, 2025
lucifer[m]
holycow23, suvid, rayyan_seliya123, m.amanullah7: you can collect your doubts and any queries that you have, or if something is not clear about the LB codebase etc. and we can have a meet tomorrow or day after to discuss it.
2025-06-23 17448, 2025
lucifer[m]
holycow23: what do you mean?
2025-06-23 17414, 2025
holycow23[m] uploaded an image: (28KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/sEvfUQTzSoeYeXHnpeYOIpFl/image.png >
2025-06-23 17432, 2025
holycow23[m]
How do I check if it ran properly
2025-06-23 17451, 2025
holycow23[m]
I haven't written the script to render the frontend, or do I write that and check
2025-06-23 17402, 2025
lucifer[m]
ah okay, yes check logs for request consumer container.
2025-06-23 17411, 2025
lucifer[m]
./develop.sh spark logs request_consumer
2025-06-23 17446, 2025
holycow23[m] sent a request_consumer code block: https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/yGQIDsGLFuXdGqfLMOOzxucc
2025-06-23 17413, 2025
holycow23[m]
But I have defined 'stats.user.genre_activity': listenbrainz_spark.stats.user.genre_activity.get_genre_activity,
2025-06-23 17427, 2025
lucifer[m]
i see, maybe confirm you are on the correct branch and restart the request consumer container
2025-06-23 17429, 2025
holycow23[m]
Yeah branch is fine
2025-06-23 17455, 2025
lucifer[m]
./develop.sh spark build, ./develop.sh spark down, ./develop.sh spark up
is there a quick alternative to see if it will work?
2025-06-23 17420, 2025
lucifer[m]
try testing on pyspark repl/cmd with higher driver memory.
2025-06-23 17425, 2025
lucifer[m]
do the setup that you usually need to do as shared in previous gists then from listenbrainz_spark.stats.user.genre_activity import get_genre_activity and call it with appropriate parameters.
2025-06-23 17418, 2025
holycow23[m]
oh so you want me to run pyspark --driver-memory 8g and then the script to run the test?