so we can provide users with an option to choose which method they use, we can save that to options, and then in the method that you linked to, we can look at the option and choose the correct training process
jmp_music_
do you think it is proper to created a new function that works with the sklearn, like evaluate_dataset_sklearn()?
mmm, so gaia_wrapper.train_model is in a separate file
slriv joined the channel
I'm just thinking
what's the main entrypoint to your train method for sklearn?
jmp_music_
exaclty. We could make a sklearn_wrapper
alastairp
keep in mind that the gaia wrapper only exists because the interface to gaia was a little bit verbose
jmp_music_
it's the file called `create_classification_project.py`
we can load it directly
alastairp
because we control all of the code for the sklearn model training process, we should be able to just include a single method there that we can call
white_snack has quit
in fact, all we need is a single method like `train_model` in the code that you've already added, and a new method like `save_history_file` which saves the sklearn model
train_model can be in create_classification_project or similar
jmp_music_
great. Thus, a separate method inside the evaluate.py would be preferable?
alastairp
save_sklearn_model (for example) could be be in evaluate.py
white_snack joined the channel
for now I'm not too worried if we put the method in evaluate.py or create_classification_project.py, we can change it if we think it should be moved
jmp_music_
great. That was what I was thinking too
Mr_Monkey
Hi ruaok! Were you able to have a look at adding a time_range parameter to the user/XXX/listens API endpoint?
jmp_music_
Too more questions
two*
alastairp
this should be a good start. once this is done, we will be able to build a dataset and train its model with sklearn! that's the second part of closing the loop
ruaok
Mr_Monkey: hi! sorry, that hasn't made to the top of the list yet.
Mr_Monkey
Some of the functionality I'll want to implement on the React side (if there are less listens than the expected count and my oldest ts is older than tha, fetch listens again with a bigger time_range)
jmp_music_
I saw in the create dataset from AB API, that I can download a csv file, which contains the MBIDs from a dataset for evaluation
ruaok
but, I have 4 hours left, and not that much work I can do with this shit connection.
Mr_Monkey
No problem. I don't think this is a huge rsh, but I do wonder what it's looking like for users at the moment.
ruaok
I might just be able to knock that out.
jmp_music_
how could I load to localhost AB the relevant low-level
in order to experiment later on with the evaluation
?
ruaok
Mr_Monkey: I'll do it now. should I make a new branch or add to an existing branch?
I keep around this basic submit script that will take a directory of files named [uuid].json and submit them to a local server
jmp_music_
great! This would be really helpful
Mr_Monkey
What I mean is that the front end needs to determine wether it should fetch more listens or not. If I get less than the 25 listens i fetched for, I'll compare the ts for the oldest of those with the oldest ts for that user, which I have access to. If it's higher than the oldest ts that means there are more listens to fetch.
slriv has quit
jmp_music_
I thought if could be any script that can read the MBIDs from a dataset csv file
Although it doesn't look like we were comparing timestamps at all
ruaok
makes sense to
slriv joined the channel
Mr_Monkey
As for time_range and listenstore, I'm not sure. Not in user.py I don't think
jmp_music_
@alastairp: could we remove it from the pre-processing steps? I started a train for a really large dataset in a computer that has 48 cores, wears a CUDA GPU of 7000 dollars and it runs two days in row 😂
Mr_Monkey
But we'll want to implement it in the API instead
alastairp
jmp_music_: sure, for now if it takes too long with sklearn then we should remove it to make the training process faster
Mr_Monkey
as an argument for the API endpoint*
jmp_music_
also the mfcc step never returns good results
alastairp
if we have time, we should look into it in more detail, because obviously gaia is doing something with that data
ruaok
Mr_Monkey: but the idea is that the concept of looking for more listens completely disappears from the javascript/html side of things, right?
jmp_music_
cool
Mr_Monkey
No. On the contrary, we let the front-end decide when it should request a bigger time_range from the API.
slriv has quit
ruaok
ok, now I am really confused.
jmp_music_
@alastairp: thanks a lot! I 'll start working on the evaluate scripts
Mr_Monkey
That way, we don't make the initial call to load the user page longer by looking for bigger time range, and only do so when the user decides to (click a button to load more)
slriv joined the channel
alastairp
jmp_music_: great, I'll try and get these interface changes made soon so that you can extend this work. let me know if you have any questions or need me to do something
jmp_music_
of course! thanks again. Oh, one final question. Do I have to make a readme file inside the folder of the sklearn tool, in the purposes of the GSoC? I just saw an email they sent to me earlier toaday
slriv has quit
today*
slriv joined the channel
ruaok
Mr_Monkey: which branch should I be looking at? I think I need to see what the current code is that we're going for.
Mr_Monkey
So we previously used to load the default time range, and passed a flag to react if there were more listens to fetch. User could then click a button to reload the page with a flag to load a bigger time_range. All we want to do now is move that mechanism to the front-end only, but instead of reloading the page we call the API (which is where a time_range arg is currently missing)
master
ruaok
ok, so the task list is: remove the time_range arguments from the profile view and instead add a time_range argument to the API. do I have that right then?
Mr_Monkey
Yes
ruaok
ok, thanks for clarifying.
alastairp
jmp_music_: yes, that was in fact one of the things that I was going to add to the review of your PR
it would be good to have a readme in that folder that briefly explains what the module does, along with some examples of how to use each part
Mr_Monkey
In the same way we decided to fetch the listens count after page load to not make the page load longer than necessary, we figured it would be better to let the user decide to load more data if they wish. The alternative is making page loads potentially longer for user's profiles you might be visiting, but only care to see the latest listens or stats, but don't necessarily care about having a full page of listens)
ruaok
ok totally makes sense. I had understood something completely differently when it was explained at first. hence my confusion.
but with the dynamic loading it makes sense.
heh, the train crew is rather sassy. making all sorts of comments about how not to wear a mask. lol
jmp_music_
@alastairp: OK, I'll be waiting for the review
Mr_Monkey
👍
jmp_music_
thanks again :)
Mr_Monkey
All aboard the sassy train!
ruaok
Mr_Monkey: the prop for the user view will no longer receive search_larger_time_range, can you confirm that.
Mr_Monkey
Confirmed
ruaok
k. what is the name of the argument to the API that specifies range?
Mr_Monkey
In fact that's already been removed on the React side
There is currently no argument for range on the API
ruaok
and also, search_larger_time_range will no longer be a valid option for the profile view, right?
Mr_Monkey
Correct
white_shadow joined the channel
ruaok
gah! we're currently passing through the 100km section where its nothing but tunnel-bridge-tunnel-bridge connectivity sucks!
try the move-range-to-api branch -- does that look like what you are expecting?