in #metabrainz

19:26 PM
alastairp

but I wonder if you should consider range and frequency as different
19:27 PM
to take an example, imagine if you run every month with a 1 month range. so you'll have data for april, may, june, july
19:27 PM
then in august you decide that you want to run with 2 month range, so it'll have data from july and august
19:28 PM
pristine__

Yup. True
19:28 PM
alastairp

and in september you run again (still running every month) with data from august and september
19:28 PM
so you can still use the date of the last time you ran (e.g. 31 june) to calculate the end time of this time (30 august)
19:30 PM
and even if you want to change the _frequency_, that would be OK too. you just look at the last time you ran it, add the duration of the frequency (1 month, 2 weeks, 1 week), and if that amount of time has passed since the last time, you run the process
19:30 PM
pristine__

Yes. That is what i was saying. from_date ~ last date. Add the range and get the next date. I wonder it would be tricky for the first run because that time model_metadeta would be empty
19:31 PM
> So probably just fetch from_date from table, which will become to_date of your new run and then add no of months to this date so from_date now becomes 1-08-19
19:31 PM
Here.
19:32 PM
The lang was not ao clear. Shit. Sorry
19:33 PM
Running files for first time to set the page is always tricky I guess
19:33 PM
alastairp

I wouldn't use the last to_date as the current from_date
19:34 PM
I would use last to_date + run_frequency as the new to_date, and new to_date - range as the from_date
19:34 PM
if there's no data, to_date is today
19:36 PM
pristine__

Yup. And i should add this today's date in that table
19:37 PM
alastairp

right
19:38 PM
pristine__

Need a check. If model_metdata is empty add today's date
19:38 PM
If so, i can see a flaw here.
19:40 PM
( I realize summits are imp, f2f discussions are good )
19:42 PM
What if it is the first run, so I ran create_dataframe today but ran train_models a day or whatever after, since entry in model_metadata is linked to train_model, the *today* would be a different date then the date when create_dataframes was actually run.
19:47 PM
alastairp

right
19:48 PM
you might consider having another table to track the runs of create_dataframe
19:48 PM
or ensure that you always run both of them together? or refuse to run train_models if the data is out of date?
19:49 PM
pristine__

Another table is better option i guess because it provided flexibility
19:50 PM
What do you think?
19:51 PM
( I thought of another table once but dropped the idea because model_metadata has every imp info )
19:52 PM
( Managing data is not that easy. Phew )
19:55 PM
alastairp

a new table sounds ok to me, but I don't know enough about your current process to understand exactly how everything fits together
19:58 PM
pristine__

I am really enjoying project discussion like this, the first thing I do tomorrow is prepare a detailed readme for you ( and anyone who wants to read ) to understand the flow. The mote reviewers we have, the better outcome
19:58 PM
This way the project may take time to come in shape but it will as perfect as it can be.
20:07 PM
Thank you so much :)
20:22 PM
iliekcomputers

ruaok: I opened a PR for cron on the leader cluster (for stats)
20:23 PM
What are your general opinions on running cron in the cluster, stats become more push than pull, and we'll run a consumer on lemmy.
21:26 PM
ruaok

reosarevok: that's why we left the forest. once we realized that we could become lunch....
21:26 PM
iliekcomputers: I personally think it would be better for lemmy to be in charge.
21:27 PM
we will need to find a way to get the cluster to react to events on the main server...
21:27 PM
so, more pull than push.
21:34 PM
CatQuest

" I personally think it would be better for lemmy to be in charge."
21:34 PM
this is the best quote
21:34 PM
(if taken out of context)
22:02 PM
antlarr has quit
22:12 PM
iliekcomputers

So lemmy calls a script that runs on the cluster? I thought that decision was because of the resource usage constraints due to vms but if we're gonna run a cluster all the time, what's the advantage?
23:03 PM
CatQuest

adventagious!
23:24 PM
D4RK-PH0ENiX has quit
23:42 PM
D4RK-PH0ENiX joined the channel