because we won't store all possible genres for each hour but only top N so the aggregation won't exactly match what you have got in 6 hour brackets directly for users with more than N genres in each hour.
monkey, ansh thoughts on what to do?
monkey[m] catche sup
monkey[m]
I'm not clear on the differences that arise between bucketing on the server and on the front-end
lucifer[m]
there are two options for bucketing. 1. Bucket in Spark in 6 hours time frames and keep top N genres for each time frame. this option has the issue that the hours will not take into account the user's time zone.
monkey[m]
Ah, I see, top genre from a 6-hour window might not be the same as the aggregate of the top genre for each hour, given a window of 6 hours
lucifer[m]
and the other option yes
monkey[m]
And is bucketing absolutely necessary, considering we are thinking about sending more granular data to the front-end?
lucifer[m]
i think yes, if you store all genres for all hours. the data might be a bit too much.