in #metabrainz

15:00 PM
holycow23[m]

why?
15:00 PM
lucifer[m]

s/will/might/
15:01 PM
because we won't store all possible genres for each hour but only top N so the aggregation won't exactly match what you have got in 6 hour brackets directly for users with more than N genres in each hour.
15:11 PM
monkey, ansh thoughts on what to do?
15:13 PM
monkey[m] catche sup
15:15 PM
monkey[m]

I'm not clear on the differences that arise between bucketing on the server and on the front-end
15:17 PM
lucifer[m]

there are two options for bucketing. 1. Bucket in Spark in 6 hours time frames and keep top N genres for each time frame. this option has the issue that the hours will not take into account the user's time zone.
15:17 PM
monkey[m]

Ah, I see, top genre from a 6-hour window might not be the same as the aggregate of the top genre for each hour, given a window of 6 hours
15:17 PM
lucifer[m]

and the other option yes
15:18 PM
monkey[m]

And is bucketing absolutely necessary, considering we are thinking about sending more granular data to the front-end?
15:18 PM
lucifer[m]

i think yes, if you store all genres for all hours. the data might be a bit too much.
15:19 PM
monkey[m]

No, I'm thinking top... 3? 5? for each hour
15:19 PM
lucifer[m]

i would say if you want to show 5, store 10 i guess for aggregation.
15:19 PM
we are showing top 5-10 for each time frame no?
15:19 PM
monkey[m] opens the project proposal
15:19 PM
going by this image.
15:19 PM
lucifer[m] uploaded an image: (362KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/UvODtvxTVLlhCldCuQXPkQRk/image.png >
15:20 PM
top 5 for each time frame IIUC
15:21 PM
holycow23[m]

this was just a mockup tbh didn't think of top n before creating it
15:21 PM
* this was just a mockup to be honest, didn't think of top n before creating it
15:21 PM
lucifer[m]

i see, what's the current plan?