but I wonder if you should consider range and frequency as different
to take an example, imagine if you run every month with a 1 month range. so you'll have data for april, may, june, july
then in august you decide that you want to run with 2 month range, so it'll have data from july and august
pristine__
Yup. True
alastairp
and in september you run again (still running every month) with data from august and september
so you can still use the date of the last time you ran (e.g. 31 june) to calculate the end time of this time (30 august)
and even if you want to change the _frequency_, that would be OK too. you just look at the last time you ran it, add the duration of the frequency (1 month, 2 weeks, 1 week), and if that amount of time has passed since the last time, you run the process
pristine__
Yes. That is what i was saying. from_date ~ last date. Add the range and get the next date. I wonder it would be tricky for the first run because that time model_metadeta would be empty
> So probably just fetch from_date from table, which will become to_date of your new run and then add no of months to this date so from_date now becomes 1-08-19
Here.
The lang was not ao clear. Shit. Sorry
Running files for first time to set the page is always tricky I guess
alastairp
I wouldn't use the last to_date as the current from_date
I would use last to_date + run_frequency as the new to_date, and new to_date - range as the from_date
if there's no data, to_date is today
pristine__
Yup. And i should add this today's date in that table
alastairp
right
pristine__
Need a check. If model_metdata is empty add today's date
If so, i can see a flaw here.
( I realize summits are imp, f2f discussions are good )
What if it is the first run, so I ran create_dataframe today but ran train_models a day or whatever after, since entry in model_metadata is linked to train_model, the *today* would be a different date then the date when create_dataframes was actually run.
alastairp
right
you might consider having another table to track the runs of create_dataframe
or ensure that you always run both of them together? or refuse to run train_models if the data is out of date?
pristine__
Another table is better option i guess because it provided flexibility
What do you think?
( I thought of another table once but dropped the idea because model_metadata has every imp info )
( Managing data is not that easy. Phew )
alastairp
a new table sounds ok to me, but I don't know enough about your current process to understand exactly how everything fits together
pristine__
I am really enjoying project discussion like this, the first thing I do tomorrow is prepare a detailed readme for you ( and anyone who wants to read ) to understand the flow. The mote reviewers we have, the better outcome
This way the project may take time to come in shape but it will as perfect as it can be.
Thank you so much :)
iliekcomputers
ruaok: I opened a PR for cron on the leader cluster (for stats)
What are your general opinions on running cron in the cluster, stats become more push than pull, and we'll run a consumer on lemmy.
ruaok
reosarevok: that's why we left the forest. once we realized that we could become lunch....
iliekcomputers: I personally think it would be better for lemmy to be in charge.
we will need to find a way to get the cluster to react to events on the main server...
so, more pull than push.
CatQuest
" I personally think it would be better for lemmy to be in charge."
this is the best quote
(if taken out of context)
antlarr has quit
iliekcomputers
So lemmy calls a script that runs on the cluster? I thought that decision was because of the resource usage constraints due to vms but if we're gonna run a cluster all the time, what's the advantage?