in #metabrainz

15:00 PM
holycow23[m]

why?
15:00 PM
lucifer[m]

s/will/might/
15:01 PM
because we won't store all possible genres for each hour but only top N so the aggregation won't exactly match what you have got in 6 hour brackets directly for users with more than N genres in each hour.
15:11 PM
monkey, ansh thoughts on what to do?
15:13 PM
monkey[m] catche sup
15:15 PM
monkey[m]

I'm not clear on the differences that arise between bucketing on the server and on the front-end
15:17 PM
lucifer[m]

there are two options for bucketing. 1. Bucket in Spark in 6 hours time frames and keep top N genres for each time frame. this option has the issue that the hours will not take into account the user's time zone.
15:17 PM
monkey[m]

Ah, I see, top genre from a 6-hour window might not be the same as the aggregate of the top genre for each hour, given a window of 6 hours
15:17 PM
lucifer[m]

and the other option yes
15:18 PM
monkey[m]

And is bucketing absolutely necessary, considering we are thinking about sending more granular data to the front-end?
15:18 PM
lucifer[m]

i think yes, if you store all genres for all hours. the data might be a bit too much.