#metabrainz

/

      • alastairp
        but I wonder if you should consider range and frequency as different
      • 2019-08-21 23336, 2019

      • alastairp
        to take an example, imagine if you run every month with a 1 month range. so you'll have data for april, may, june, july
      • 2019-08-21 23353, 2019

      • alastairp
        then in august you decide that you want to run with 2 month range, so it'll have data from july and august
      • 2019-08-21 23308, 2019

      • pristine__
        Yup. True
      • 2019-08-21 23310, 2019

      • alastairp
        and in september you run again (still running every month) with data from august and september
      • 2019-08-21 23356, 2019

      • alastairp
        so you can still use the date of the last time you ran (e.g. 31 june) to calculate the end time of this time (30 august)
      • 2019-08-21 23306, 2019

      • alastairp
        and even if you want to change the _frequency_, that would be OK too. you just look at the last time you ran it, add the duration of the frequency (1 month, 2 weeks, 1 week), and if that amount of time has passed since the last time, you run the process
      • 2019-08-21 23331, 2019

      • pristine__
        Yes. That is what i was saying. from_date ~ last date. Add the range and get the next date. I wonder it would be tricky for the first run because that time model_metadeta would be empty
      • 2019-08-21 23351, 2019

      • pristine__
        > So probably just fetch from_date from table, which will become to_date of your new run and then add no of months to this date so from_date now becomes 1-08-19
      • 2019-08-21 23355, 2019

      • pristine__
        Here.
      • 2019-08-21 23331, 2019

      • pristine__
        The lang was not ao clear. Shit. Sorry
      • 2019-08-21 23351, 2019

      • pristine__
        Running files for first time to set the page is always tricky I guess
      • 2019-08-21 23355, 2019

      • alastairp
        I wouldn't use the last to_date as the current from_date
      • 2019-08-21 23345, 2019

      • alastairp
        I would use last to_date + run_frequency as the new to_date, and new to_date - range as the from_date
      • 2019-08-21 23359, 2019

      • alastairp
        if there's no data, to_date is today
      • 2019-08-21 23308, 2019

      • pristine__
        Yup. And i should add this today's date in that table
      • 2019-08-21 23324, 2019

      • alastairp
        right
      • 2019-08-21 23309, 2019

      • pristine__
        Need a check. If model_metdata is empty add today's date
      • 2019-08-21 23339, 2019

      • pristine__
        If so, i can see a flaw here.
      • 2019-08-21 23350, 2019

      • pristine__
        ( I realize summits are imp, f2f discussions are good )
      • 2019-08-21 23350, 2019

      • pristine__
        What if it is the first run, so I ran create_dataframe today but ran train_models a day or whatever after, since entry in model_metadata is linked to train_model, the *today* would be a different date then the date when create_dataframes was actually run.
      • 2019-08-21 23324, 2019

      • alastairp
        right
      • 2019-08-21 23304, 2019

      • alastairp
        you might consider having another table to track the runs of create_dataframe
      • 2019-08-21 23334, 2019

      • alastairp
        or ensure that you always run both of them together? or refuse to run train_models if the data is out of date?
      • 2019-08-21 23326, 2019

      • pristine__
        Another table is better option i guess because it provided flexibility
      • 2019-08-21 23312, 2019

      • pristine__
        What do you think?
      • 2019-08-21 23331, 2019

      • pristine__
        ( I thought of another table once but dropped the idea because model_metadata has every imp info )
      • 2019-08-21 23343, 2019

      • pristine__
        ( Managing data is not that easy. Phew )
      • 2019-08-21 23348, 2019

      • alastairp
        a new table sounds ok to me, but I don't know enough about your current process to understand exactly how everything fits together
      • 2019-08-21 23302, 2019

      • pristine__
        I am really enjoying project discussion like this, the first thing I do tomorrow is prepare a detailed readme for you ( and anyone who wants to read ) to understand the flow. The mote reviewers we have, the better outcome
      • 2019-08-21 23340, 2019

      • pristine__
        This way the project may take time to come in shape but it will as perfect as it can be.
      • 2019-08-21 23338, 2019

      • pristine__
        Thank you so much :)
      • 2019-08-21 23308, 2019

      • iliekcomputers
        ruaok: I opened a PR for cron on the leader cluster (for stats)
      • 2019-08-21 23338, 2019

      • iliekcomputers
        What are your general opinions on running cron in the cluster, stats become more push than pull, and we'll run a consumer on lemmy.
      • 2019-08-21 23309, 2019

      • ruaok
        reosarevok: that's why we left the forest. once we realized that we could become lunch....
      • 2019-08-21 23339, 2019

      • ruaok
        iliekcomputers: I personally think it would be better for lemmy to be in charge.
      • 2019-08-21 23328, 2019

      • ruaok
        we will need to find a way to get the cluster to react to events on the main server...
      • 2019-08-21 23338, 2019

      • ruaok
        so, more pull than push.
      • 2019-08-21 23320, 2019

      • CatQuest
        " I personally think it would be better for lemmy to be in charge."
      • 2019-08-21 23320, 2019

      • CatQuest
        this is the best quote
      • 2019-08-21 23320, 2019

      • CatQuest
        (if taken out of context)
      • 2019-08-21 23322, 2019

      • antlarr has quit
      • 2019-08-21 23317, 2019

      • iliekcomputers
        So lemmy calls a script that runs on the cluster? I thought that decision was because of the resource usage constraints due to vms but if we're gonna run a cluster all the time, what's the advantage?
      • 2019-08-21 23357, 2019

      • CatQuest
        adventagious!
      • 2019-08-21 23303, 2019

      • D4RK-PH0ENiX has quit
      • 2019-08-21 23332, 2019

      • D4RK-PH0ENiX joined the channel