because we won't store all possible genres for each hour but only top N so the aggregation won't exactly match what you have got in 6 hour brackets directly for users with more than N genres in each hour.
monkey, ansh thoughts on what to do?
monkey[m] catche sup
monkey[m]
I'm not clear on the differences that arise between bucketing on the server and on the front-end
lucifer[m]
there are two options for bucketing. 1. Bucket in Spark in 6 hours time frames and keep top N genres for each time frame. this option has the issue that the hours will not take into account the user's time zone.
monkey[m]
Ah, I see, top genre from a 6-hour window might not be the same as the aggregate of the top genre for each hour, given a window of 6 hours
lucifer[m]
and the other option yes
monkey[m]
And is bucketing absolutely necessary, considering we are thinking about sending more granular data to the front-end?
lucifer[m]
i think yes, if you store all genres for all hours. the data might be a bit too much.
monkey[m]
No, I'm thinking top... 3? 5? for each hour
lucifer[m]
i would say if you want to show 5, store 10 i guess for aggregation.
we are showing top 5-10 for each time frame no?
monkey[m] opens the project proposal
going by this image.
lucifer[m] uploaded an image: (362KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/UvODtvxTVLlhCldCuQXPkQRk/image.png >
top 5 for each time frame IIUC
holycow23[m]
this was just a mockup tbh didn't think of top n before creating it
* this was just a mockup to be honest, didn't think of top n before creating it
lucifer[m]
i see, what's the current plan?
holycow23[m]
I feel top 10 is a good idea no?
but there is one more issue
most songs have multiple genres of like very similar category
lucifer[m]
if you are showing it on a pie chart, i think 40 items will be cluttered.
holycow23[m]
lucifer[m]: no we could render top 10 each hour
and then ultimately do top 5
monkey[m]
holycow23[m]: That still sounds like it won't fit
lucifer[m]
yeah sure that we can do, but first let's confirm what we want to do.
as in what to show to the user.
holycow23[m]
lucifer[m]: I mean this is top 5 for each sector and it looks nice no?
monkey[m]
Yes
lucifer[m]
sure.
so yes if you want to show top 5 per time frame, i think we should store top 10 per hour.
i think that should be doable.
if not we can revisit.
as for similar genres, yes that's an issue.
holycow23[m]
Okay
so, just a limit 10 filter that needs to be added
lucifer[m]: yes
lucifer[m]
limit 10 and change aggregation to per hour.
i don't have any suggestions on how to solve the genres right now.
holycow23[m]
and what change needs to be done to fetch timezone?
lucifer[m]
but maybe monkey or reosarevok has suggestions.
you don't fetch the timezone at all in spark.
you'll obtain that in the frontend react component using date time APIs.
all the hours returned by spark and the backend will in UTC.
* backend will be in UTC.
reosarevok[m]
Top 1 or max 3 per hour could make sense, for 3 if you split each pie segment into 3 based on the amounts of listens maybe?
If you do it per hour
monkey[m]
I think the naive approach (count top genre regardless of them being similar) might just be good enough.
Another option would be to add hierarchical data like we had on the treemap component for year in music
monkey[m] uploaded an image: (73KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/nwyjwylscZaFyFuyXnJPxtCi/image.png >
reosarevok[m]
"count top, then improve later if you have a lot of time" seems sensible to me
lucifer[m]
reosarevok: also asking about how to handle genres that are too similar.
lucifer: I had generated for the 6 hour aggregates so not the aggregate isn't working since new hourly mapping so how do I remove the table?
* lucifer: I had generated for the 6 hour aggregates so not the new query isn't working due to the new hourly mapping so how do I remove the old table?