#metabrainz

/

      • holycow23[m]
        why?
      • lucifer[m]
        s/will/might/
      • because we won't store all possible genres for each hour but only top N so the aggregation won't exactly match what you have got in 6 hour brackets directly for users with more than N genres in each hour.
      • monkey, ansh thoughts on what to do?
      • monkey[m] catche sup
      • monkey[m]
        I'm not clear on the differences that arise between bucketing on the server and on the front-end
      • lucifer[m]
        there are two options for bucketing. 1. Bucket in Spark in 6 hours time frames and keep top N genres for each time frame. this option has the issue that the hours will not take into account the user's time zone.
      • monkey[m]
        Ah, I see, top genre from a 6-hour window might not be the same as the aggregate of the top genre for each hour, given a window of 6 hours
      • lucifer[m]
        and the other option yes
      • monkey[m]
        And is bucketing absolutely necessary, considering we are thinking about sending more granular data to the front-end?
      • lucifer[m]
        i think yes, if you store all genres for all hours. the data might be a bit too much.
      • monkey[m]
        No, I'm thinking top... 3? 5? for each hour
      • lucifer[m]
        i would say if you want to show 5, store 10 i guess for aggregation.
      • we are showing top 5-10 for each time frame no?
      • monkey[m] opens the project proposal
      • going by this image.
      • lucifer[m] uploaded an image: (362KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/UvODtvxTVLlhCldCuQXPkQRk/image.png >
      • top 5 for each time frame IIUC
      • holycow23[m]
        this was just a mockup tbh didn't think of top n before creating it
      • * this was just a mockup to be honest, didn't think of top n before creating it
      • lucifer[m]
        i see, what's the current plan?