#metabrainz

/

      • alastairp
        but I wonder if you should consider range and frequency as different
      • to take an example, imagine if you run every month with a 1 month range. so you'll have data for april, may, june, july
      • then in august you decide that you want to run with 2 month range, so it'll have data from july and august
      • pristine__
        Yup. True
      • alastairp
        and in september you run again (still running every month) with data from august and september
      • so you can still use the date of the last time you ran (e.g. 31 june) to calculate the end time of this time (30 august)
      • and even if you want to change the _frequency_, that would be OK too. you just look at the last time you ran it, add the duration of the frequency (1 month, 2 weeks, 1 week), and if that amount of time has passed since the last time, you run the process
      • pristine__
        Yes. That is what i was saying. from_date ~ last date. Add the range and get the next date. I wonder it would be tricky for the first run because that time model_metadeta would be empty
      • > So probably just fetch from_date from table, which will become to_date of your new run and then add no of months to this date so from_date now becomes 1-08-19
      • Here.
      • The lang was not ao clear. Shit. Sorry
      • Running files for first time to set the page is always tricky I guess
      • alastairp
        I wouldn't use the last to_date as the current from_date
      • I would use last to_date + run_frequency as the new to_date, and new to_date - range as the from_date
      • if there's no data, to_date is today
      • pristine__
        Yup. And i should add this today's date in that table
      • alastairp
        right
      • pristine__
        Need a check. If model_metdata is empty add today's date
      • If so, i can see a flaw here.
      • ( I realize summits are imp, f2f discussions are good )
      • What if it is the first run, so I ran create_dataframe today but ran train_models a day or whatever after, since entry in model_metadata is linked to train_model, the *today* would be a different date then the date when create_dataframes was actually run.
      • alastairp
        right
      • you might consider having another table to track the runs of create_dataframe
      • or ensure that you always run both of them together? or refuse to run train_models if the data is out of date?
      • pristine__
        Another table is better option i guess because it provided flexibility
      • What do you think?
      • ( I thought of another table once but dropped the idea because model_metadata has every imp info )
      • ( Managing data is not that easy. Phew )
      • alastairp
        a new table sounds ok to me, but I don't know enough about your current process to understand exactly how everything fits together
      • pristine__
        I am really enjoying project discussion like this, the first thing I do tomorrow is prepare a detailed readme for you ( and anyone who wants to read ) to understand the flow. The mote reviewers we have, the better outcome
      • This way the project may take time to come in shape but it will as perfect as it can be.
      • Thank you so much :)
      • iliekcomputers
        ruaok: I opened a PR for cron on the leader cluster (for stats)
      • What are your general opinions on running cron in the cluster, stats become more push than pull, and we'll run a consumer on lemmy.
      • ruaok
        reosarevok: that's why we left the forest. once we realized that we could become lunch....
      • iliekcomputers: I personally think it would be better for lemmy to be in charge.
      • we will need to find a way to get the cluster to react to events on the main server...
      • so, more pull than push.
      • CatQuest
        " I personally think it would be better for lemmy to be in charge."
      • this is the best quote
      • (if taken out of context)
      • antlarr has quit
      • iliekcomputers
        So lemmy calls a script that runs on the cluster? I thought that decision was because of the resource usage constraints due to vms but if we're gonna run a cluster all the time, what's the advantage?
      • CatQuest
        adventagious!
      • D4RK-PH0ENiX has quit
      • D4RK-PH0ENiX joined the channel