ok i switched to token based auth like in submit-listens as of now for testing
thanks @fettuccinae:matrix.org for helping me out!
* like in `submit-listens, * submit-listens` as
* like in `submit-listens, * submit-listens`, * endpoint as of
Now while the auth works and I am able to test the endpoint, but when I am using current_user, it doesnt work as it considers it anonymous user and has no id attribute
It considers that I'm not logged in even if I authenticate with token
Looks like i'll have to find a easy to login and then test in postman
Can someone pls help me out with it?
lucifer
pite has quit
Kladky joined the channel
lucifer[m]
suvid: add a simple html button and dropdown on any page in your local lb frontend code and use it to test your endpoints.
holycow23: remind me, did you want me to help resolve a particular issue on the genre activity PR or just generally review?
holycow23[m] joined the channel
holycow23[m]
I have made the PR but not sure about the working of the entire stats, I just followed all the steps you had asked
* lucifer: I have
lucifer[m]
i see, did you try testing it on your wolf setup?
holycow23[m]
If you could help me with the steps to test the entire thing on wolf
No, I haven't tested it right now, could you help me with the steps to start the test
lucifer[m]
okay, i'll take a look.
monkey: hi! do you think there is value in showing the users the original name of the file users upload for a listens import?
sure.
monkey[m] joined the channel
monkey[m]
Hi! Yes I think that would potentially be very useful.
Looking at my spotify exports for example there are many files I would need to upload, this would help keep track
Probably also useful to have the timestamps for the first and last listen for that file
suvid[m]
lucifer: could you also clarify the schema part for the new table?
lucifer[m]
I was thinking we would store only the name of the zip file uploaded.
suvid[m]
Lemme find the msg I sent above
lucifer[m]
share your current schema
suvid[m]
<suvid[m]> "Also, lucifer what should be the..." <- > <@suvid:matrix.org> Also, lucifer what should be the schema for the user_data_import table?... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
lucifer: I am planning to store file path in the schema, so do I also need to store the file name as well explicitly?
So my query has LEFT JOIN genres g ON l.recording_mbid = g.recording_mbid but genre isn't registered and originally I used to run genre_df.createOrReplaceTempView("genres") but not in the main stats so how do I resolve it
holycow23[m]: Where `genre_df` is ```genre_df = spark.read.parquet(f"{config.HDFS_CLUSTER_URI}/recording_genre")```
bitmap[m] joined the channel
bitmap[m]
<reosarevok[m]> "bitmap: any idea what could..." <- I tried entering a similar edit on my dev server and noticed a couple things: (1) `edits_pending` is never incremented on the new url and (2) the edit is never associated to the new url in the `edit_url` table. since those are the criteria the RemoveEmpty script cares about, I'm guessing it was removed by that script
reosarevok[m]
Oh no.
So basically any such edit which is not an autoedit will basically fail like this or?
bitmap[m]
yeah, after two days
lucifer[m]
holycow23: you can look at how we read other metadata caches in stats and do something similar. alternatively, i can push a commit to your branch implementing that.
holycow23[m]
I would want to look into it on my own, if no success then will ask you
lucifer[m]
sure take a look at how release_metadata_cache is used in release group stats for one example.
holycow23[m]
this get_release_metadata_cache is the main function I am assuming?
reosarevok[m]
<bitmap[m]> "yeah, after two days" <- Are you submitting a fix? I assume the fix is to do both the things you said :)
bitmap[m]
nope i'm writing the script for MBS-14049 rn, if you want I can take a look at it after
<lucifer[m]> "sure take a look at how release_..." <- Does [this](github.com/metabrainz/listenbrainz-server/blob/master/listenbrainz_spark/hdfs/upload.py#L16) help anywhere the GENRE imports
Yeah I was going through this only, but couldn't understand the query that well
lucifer[m]
it reads the data from HDFS and caches it in spark. ideally you would need to copy paste the code and just change the dataframe path.
or just for now, you can do genres_df = read_files_from_HDFS(RECORDING_RECORDING_GENRE_DATAFRAME) then genres_df.createOrReplaceTempView("genres") in your stats code.
holycow23[m]
but what about all the columns?
okay
lucifer[m]
which columns?
holycow23[m]
lucifer[m]: The query in this file is a little confusing in the first glance so will take time and try to understand what's happening here
lucifer[m]
which particular query? can you point to the specific line?
asking because there is no query in that file.
holycow23[m]
oh wait my bad I was looking at get_release_group_metadata_cache_query
lucifer[m]
ah well :). check the three specific functions that i shared above.
* specific functions in the link that i
holycow23[m]
yes going through that right now
parallely have run a test run to see what's the next error 😢
lucifer[m]
i think its possible that the test run will fail because the genre data might be not present in the test data.
my suggestion would be to do a normal run using ./develop.sh that would have the data from sample dumps.
holycow23[m]
something like this ./develop.sh manage spark request_user_stats --type entity --entity artists --range this_week
lucifer[m]
yes
holycow23[m]
Got PathNotFoundException: Path not found: /recording_genre so I am assuming missing data only?
lucifer[m]
yup
do you get that with the ./develop.sh run as well?
holycow23[m]
No trying that right now
lucifer[m]
okay cool.
holycow23[m]
I ran the query not request consumer?
lucifer[m]
holycow23, suvid, rayyan_seliya123, m.amanullah7: you can collect your doubts and any queries that you have, or if something is not clear about the LB codebase etc. and we can have a meet tomorrow or day after to discuss it.
holycow23: what do you mean?
holycow23[m] uploaded an image: (28KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/sEvfUQTzSoeYeXHnpeYOIpFl/image.png >
holycow23[m]
How do I check if it ran properly
I haven't written the script to render the frontend, or do I write that and check
lucifer[m]
ah okay, yes check logs for request consumer container.
./develop.sh spark logs request_consumer
holycow23[m] sent a request_consumer code block: https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/yGQIDsGLFuXdGqfLMOOzxucc
holycow23[m]
But I have defined 'stats.user.genre_activity': listenbrainz_spark.stats.user.genre_activity.get_genre_activity,
lucifer[m]
i see, maybe confirm you are on the correct branch and restart the request consumer container
holycow23[m]
Yeah branch is fine
lucifer[m]
./develop.sh spark build, ./develop.sh spark down, ./develop.sh spark up
is there a quick alternative to see if it will work?
lucifer[m]
try testing on pyspark repl/cmd with higher driver memory.
do the setup that you usually need to do as shared in previous gists then from listenbrainz_spark.stats.user.genre_activity import get_genre_activity and call it with appropriate parameters.
holycow23[m]
oh so you want me to run pyspark --driver-memory 8g and then the script to run the test?