#metabrainz

/

19:26 PM
alastairp

but I wonder if you should consider range and frequency as different

2019-08-21 23336, 2019

19:27 PM
alastairp

to take an example, imagine if you run every month with a 1 month range. so you'll have data for april, may, june, july

2019-08-21 23353, 2019

19:27 PM
alastairp

then in august you decide that you want to run with 2 month range, so it'll have data from july and august

2019-08-21 23308, 2019

19:28 PM
pristine__

Yup. True

2019-08-21 23310, 2019

19:28 PM
alastairp

and in september you run again (still running every month) with data from august and september

2019-08-21 23356, 2019

19:28 PM
alastairp

so you can still use the date of the last time you ran (e.g. 31 june) to calculate the end time of this time (30 august)

2019-08-21 23306, 2019

19:30 PM
alastairp

and even if you want to change the _frequency_, that would be OK too. you just look at the last time you ran it, add the duration of the frequency (1 month, 2 weeks, 1 week), and if that amount of time has passed since the last time, you run the process

2019-08-21 23331, 2019

19:30 PM
pristine__

Yes. That is what i was saying. from_date ~ last date. Add the range and get the next date. I wonder it would be tricky for the first run because that time model_metadeta would be empty

2019-08-21 23351, 2019

19:31 PM
pristine__

> So probably just fetch from_date from table, which will become to_date of your new run and then add no of months to this date so from_date now becomes 1-08-19

2019-08-21 23355, 2019

19:31 PM
pristine__

Here.

2019-08-21 23331, 2019

19:32 PM
pristine__

The lang was not ao clear. Shit. Sorry

2019-08-21 23351, 2019

19:33 PM
pristine__

Running files for first time to set the page is always tricky I guess

2019-08-21 23355, 2019

19:33 PM
alastairp

I wouldn't use the last to_date as the current from_date

2019-08-21 23345, 2019

19:34 PM
alastairp

I would use last to_date + run_frequency as the new to_date, and new to_date - range as the from_date

2019-08-21 23359, 2019

19:34 PM
alastairp

if there's no data, to_date is today

2019-08-21 23308, 2019

19:36 PM
pristine__

Yup. And i should add this today's date in that table

2019-08-21 23324, 2019

19:37 PM
alastairp

right

2019-08-21 23309, 2019

19:38 PM
pristine__

Need a check. If model_metdata is empty add today's date

2019-08-21 23339, 2019

19:38 PM
pristine__

If so, i can see a flaw here.

2019-08-21 23350, 2019

19:40 PM
pristine__

( I realize summits are imp, f2f discussions are good )

2019-08-21 23350, 2019

19:42 PM
pristine__

What if it is the first run, so I ran create_dataframe today but ran train_models a day or whatever after, since entry in model_metadata is linked to train_model, the *today* would be a different date then the date when create_dataframes was actually run.

2019-08-21 23324, 2019

19:47 PM
alastairp

right

2019-08-21 23304, 2019

19:48 PM
alastairp

you might consider having another table to track the runs of create_dataframe

2019-08-21 23334, 2019

19:48 PM
alastairp

or ensure that you always run both of them together? or refuse to run train_models if the data is out of date?

2019-08-21 23326, 2019

19:49 PM
pristine__

Another table is better option i guess because it provided flexibility

2019-08-21 23312, 2019

19:50 PM
pristine__

What do you think?

2019-08-21 23331, 2019

19:51 PM
pristine__

( I thought of another table once but dropped the idea because model_metadata has every imp info )

2019-08-21 23343, 2019

19:52 PM
pristine__

( Managing data is not that easy. Phew )

2019-08-21 23348, 2019

19:55 PM
alastairp

a new table sounds ok to me, but I don't know enough about your current process to understand exactly how everything fits together

2019-08-21 23302, 2019

19:58 PM
pristine__

I am really enjoying project discussion like this, the first thing I do tomorrow is prepare a detailed readme for you ( and anyone who wants to read ) to understand the flow. The mote reviewers we have, the better outcome

2019-08-21 23340, 2019

19:58 PM
pristine__

This way the project may take time to come in shape but it will as perfect as it can be.

2019-08-21 23338, 2019

20:07 PM
pristine__

Thank you so much :)

2019-08-21 23308, 2019

20:22 PM
iliekcomputers

ruaok: I opened a PR for cron on the leader cluster (for stats)

2019-08-21 23338, 2019

20:23 PM
iliekcomputers

What are your general opinions on running cron in the cluster, stats become more push than pull, and we'll run a consumer on lemmy.

2019-08-21 23309, 2019

21:26 PM
ruaok

reosarevok: that's why we left the forest. once we realized that we could become lunch....

2019-08-21 23339, 2019

21:26 PM
ruaok

iliekcomputers: I personally think it would be better for lemmy to be in charge.

2019-08-21 23328, 2019

21:27 PM
ruaok

we will need to find a way to get the cluster to react to events on the main server...

2019-08-21 23338, 2019

21:27 PM
ruaok

so, more pull than push.

2019-08-21 23320, 2019

21:34 PM
CatQuest

" I personally think it would be better for lemmy to be in charge."

2019-08-21 23320, 2019

21:34 PM
CatQuest

this is the best quote

2019-08-21 23320, 2019

21:34 PM
CatQuest

(if taken out of context)

2019-08-21 23322, 2019

22:02 PM
antlarr has quit

2019-08-21 23317, 2019

22:12 PM
iliekcomputers

So lemmy calls a script that runs on the cluster? I thought that decision was because of the resource usage constraints due to vms but if we're gonna run a cluster all the time, what's the advantage?

2019-08-21 23357, 2019

23:03 PM
CatQuest

adventagious!

2019-08-21 23303, 2019

23:24 PM
D4RK-PH0ENiX has quit

2019-08-21 23332, 2019

23:42 PM
D4RK-PH0ENiX joined the channel