#metabrainz

/

      • holycow23[m]
        why?
      • lucifer[m]
        s/will/might/
      • because we won't store all possible genres for each hour but only top N so the aggregation won't exactly match what you have got in 6 hour brackets directly for users with more than N genres in each hour.
      • monkey, ansh thoughts on what to do?
      • monkey[m] catche sup
      • monkey[m]
        I'm not clear on the differences that arise between bucketing on the server and on the front-end
      • lucifer[m]
        there are two options for bucketing. 1. Bucket in Spark in 6 hours time frames and keep top N genres for each time frame. this option has the issue that the hours will not take into account the user's time zone.
      • monkey[m]
        Ah, I see, top genre from a 6-hour window might not be the same as the aggregate of the top genre for each hour, given a window of 6 hours
      • lucifer[m]
        and the other option yes
      • monkey[m]
        And is bucketing absolutely necessary, considering we are thinking about sending more granular data to the front-end?
      • lucifer[m]
        i think yes, if you store all genres for all hours. the data might be a bit too much.