because we won't store all possible genres for each hour but only top N so the aggregation won't exactly match what you have got in 6 hour brackets directly for users with more than N genres in each hour.
2025-06-24 17524, 2025
lucifer[m]
monkey, ansh thoughts on what to do?
2025-06-24 17545, 2025
monkey[m] catche sup
2025-06-24 17547, 2025
monkey[m]
I'm not clear on the differences that arise between bucketing on the server and on the front-end
2025-06-24 17526, 2025
lucifer[m]
there are two options for bucketing. 1. Bucket in Spark in 6 hours time frames and keep top N genres for each time frame. this option has the issue that the hours will not take into account the user's time zone.
2025-06-24 17530, 2025
monkey[m]
Ah, I see, top genre from a 6-hour window might not be the same as the aggregate of the top genre for each hour, given a window of 6 hours
2025-06-24 17540, 2025
lucifer[m]
and the other option yes
2025-06-24 17515, 2025
monkey[m]
And is bucketing absolutely necessary, considering we are thinking about sending more granular data to the front-end?
2025-06-24 17539, 2025
lucifer[m]
i think yes, if you store all genres for all hours. the data might be a bit too much.
2025-06-24 17502, 2025
monkey[m]
No, I'm thinking top... 3? 5? for each hour
2025-06-24 17502, 2025
lucifer[m]
i would say if you want to show 5, store 10 i guess for aggregation.
2025-06-24 17520, 2025
lucifer[m]
we are showing top 5-10 for each time frame no?
2025-06-24 17540, 2025
monkey[m] opens the project proposal
2025-06-24 17556, 2025
lucifer[m]
going by this image.
2025-06-24 17556, 2025
lucifer[m] uploaded an image: (362KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/UvODtvxTVLlhCldCuQXPkQRk/image.png >
2025-06-24 17534, 2025
lucifer[m]
top 5 for each time frame IIUC
2025-06-24 17504, 2025
holycow23[m]
this was just a mockup tbh didn't think of top n before creating it
2025-06-24 17529, 2025
holycow23[m]
* this was just a mockup to be honest, didn't think of top n before creating it
2025-06-24 17539, 2025
lucifer[m]
i see, what's the current plan?
2025-06-24 17510, 2025
holycow23[m]
I feel top 10 is a good idea no?
2025-06-24 17519, 2025
holycow23[m]
but there is one more issue
2025-06-24 17531, 2025
holycow23[m]
most songs have multiple genres of like very similar category
2025-06-24 17535, 2025
lucifer[m]
if you are showing it on a pie chart, i think 40 items will be cluttered.
2025-06-24 17552, 2025
holycow23[m]
lucifer[m]: no we could render top 10 each hour
2025-06-24 17500, 2025
holycow23[m]
and then ultimately do top 5
2025-06-24 17516, 2025
monkey[m]
holycow23[m]: That still sounds like it won't fit
2025-06-24 17521, 2025
lucifer[m]
yeah sure that we can do, but first let's confirm what we want to do.
2025-06-24 17528, 2025
lucifer[m]
as in what to show to the user.
2025-06-24 17538, 2025
holycow23[m]
lucifer[m]: I mean this is top 5 for each sector and it looks nice no?
2025-06-24 17552, 2025
monkey[m]
Yes
2025-06-24 17555, 2025
lucifer[m]
sure.
2025-06-24 17540, 2025
lucifer[m]
so yes if you want to show top 5 per time frame, i think we should store top 10 per hour.
2025-06-24 17522, 2025
lucifer[m]
i think that should be doable.
2025-06-24 17531, 2025
lucifer[m]
if not we can revisit.
2025-06-24 17556, 2025
lucifer[m]
as for similar genres, yes that's an issue.
2025-06-24 17557, 2025
holycow23[m]
Okay
2025-06-24 17507, 2025
holycow23[m]
so, just a limit 10 filter that needs to be added
2025-06-24 17523, 2025
holycow23[m]
lucifer[m]: yes
2025-06-24 17526, 2025
lucifer[m]
limit 10 and change aggregation to per hour.
2025-06-24 17506, 2025
lucifer[m]
i don't have any suggestions on how to solve the genres right now.
2025-06-24 17514, 2025
holycow23[m]
and what change needs to be done to fetch timezone?
2025-06-24 17514, 2025
lucifer[m]
but maybe monkey or reosarevok has suggestions.
2025-06-24 17527, 2025
lucifer[m]
you don't fetch the timezone at all in spark.
2025-06-24 17553, 2025
lucifer[m]
you'll obtain that in the frontend react component using date time APIs.
2025-06-24 17509, 2025
lucifer[m]
all the hours returned by spark and the backend will in UTC.
2025-06-24 17517, 2025
lucifer[m]
* backend will be in UTC.
2025-06-24 17534, 2025
reosarevok[m]
Top 1 or max 3 per hour could make sense, for 3 if you split each pie segment into 3 based on the amounts of listens maybe?
2025-06-24 17538, 2025
reosarevok[m]
If you do it per hour
2025-06-24 17549, 2025
monkey[m]
I think the naive approach (count top genre regardless of them being similar) might just be good enough.
2025-06-24 17549, 2025
monkey[m]
Another option would be to add hierarchical data like we had on the treemap component for year in music
2025-06-24 17551, 2025
monkey[m] uploaded an image: (73KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/nwyjwylscZaFyFuyXnJPxtCi/image.png >
2025-06-24 17513, 2025
reosarevok[m]
"count top, then improve later if you have a lot of time" seems sensible to me
2025-06-24 17518, 2025
lucifer[m]
reosarevok: also asking about how to handle genres that are too similar.
lucifer: I had generated for the 6 hour aggregates so not the aggregate isn't working since new hourly mapping so how do I remove the table?
2025-06-24 17555, 2025
holycow23[m]
* lucifer: I had generated for the 6 hour aggregates so not the new query isn't working due to the new hourly mapping so how do I remove the old table?