so we can provide users with an option to choose which method they use, we can save that to options, and then in the method that you linked to, we can look at the option and choose the correct training process
2020-08-06 21916, 2020
jmp_music_
do you think it is proper to created a new function that works with the sklearn, like evaluate_dataset_sklearn()?
mmm, so gaia_wrapper.train_model is in a separate file
2020-08-06 21952, 2020
slriv joined the channel
2020-08-06 21953, 2020
alastairp
I'm just thinking
2020-08-06 21914, 2020
alastairp
what's the main entrypoint to your train method for sklearn?
2020-08-06 21915, 2020
jmp_music_
exaclty. We could make a sklearn_wrapper
2020-08-06 21948, 2020
alastairp
keep in mind that the gaia wrapper only exists because the interface to gaia was a little bit verbose
2020-08-06 21912, 2020
jmp_music_
it's the file called `create_classification_project.py`
2020-08-06 21919, 2020
jmp_music_
we can load it directly
2020-08-06 21927, 2020
alastairp
because we control all of the code for the sklearn model training process, we should be able to just include a single method there that we can call
2020-08-06 21902, 2020
white_snack has quit
2020-08-06 21908, 2020
alastairp
in fact, all we need is a single method like `train_model` in the code that you've already added, and a new method like `save_history_file` which saves the sklearn model
2020-08-06 21923, 2020
alastairp
train_model can be in create_classification_project or similar
2020-08-06 21930, 2020
jmp_music_
great. Thus, a separate method inside the evaluate.py would be preferable?
2020-08-06 21940, 2020
alastairp
save_sklearn_model (for example) could be be in evaluate.py
2020-08-06 21948, 2020
white_snack joined the channel
2020-08-06 21911, 2020
alastairp
for now I'm not too worried if we put the method in evaluate.py or create_classification_project.py, we can change it if we think it should be moved
2020-08-06 21930, 2020
jmp_music_
great. That was what I was thinking too
2020-08-06 21933, 2020
Mr_Monkey
Hi ruaok! Were you able to have a look at adding a time_range parameter to the user/XXX/listens API endpoint?
2020-08-06 21946, 2020
jmp_music_
Too more questions
2020-08-06 21913, 2020
jmp_music_
two*
2020-08-06 21920, 2020
alastairp
this should be a good start. once this is done, we will be able to build a dataset and train its model with sklearn! that's the second part of closing the loop
2020-08-06 21951, 2020
ruaok
Mr_Monkey: hi! sorry, that hasn't made to the top of the list yet.
2020-08-06 21954, 2020
Mr_Monkey
Some of the functionality I'll want to implement on the React side (if there are less listens than the expected count and my oldest ts is older than tha, fetch listens again with a bigger time_range)
2020-08-06 21905, 2020
jmp_music_
I saw in the create dataset from AB API, that I can download a csv file, which contains the MBIDs from a dataset for evaluation
2020-08-06 21915, 2020
ruaok
but, I have 4 hours left, and not that much work I can do with this shit connection.
2020-08-06 21921, 2020
Mr_Monkey
No problem. I don't think this is a huge rsh, but I do wonder what it's looking like for users at the moment.
2020-08-06 21923, 2020
ruaok
I might just be able to knock that out.
2020-08-06 21937, 2020
jmp_music_
how could I load to localhost AB the relevant low-level
2020-08-06 21948, 2020
jmp_music_
in order to experiment later on with the evaluation
2020-08-06 21948, 2020
jmp_music_
?
2020-08-06 21914, 2020
ruaok
Mr_Monkey: I'll do it now. should I make a new branch or add to an existing branch?
I keep around this basic submit script that will take a directory of files named [uuid].json and submit them to a local server
2020-08-06 21921, 2020
jmp_music_
great! This would be really helpful
2020-08-06 21945, 2020
Mr_Monkey
What I mean is that the front end needs to determine wether it should fetch more listens or not. If I get less than the 25 listens i fetched for, I'll compare the ts for the oldest of those with the oldest ts for that user, which I have access to. If it's higher than the oldest ts that means there are more listens to fetch.
2020-08-06 21945, 2020
slriv has quit
2020-08-06 21904, 2020
jmp_music_
I thought if could be any script that can read the MBIDs from a dataset csv file
Although it doesn't look like we were comparing timestamps at all
2020-08-06 21932, 2020
ruaok
makes sense to
2020-08-06 21933, 2020
slriv joined the channel
2020-08-06 21926, 2020
Mr_Monkey
As for time_range and listenstore, I'm not sure. Not in user.py I don't think
2020-08-06 21928, 2020
jmp_music_
@alastairp: could we remove it from the pre-processing steps? I started a train for a really large dataset in a computer that has 48 cores, wears a CUDA GPU of 7000 dollars and it runs two days in row 😂
2020-08-06 21947, 2020
Mr_Monkey
But we'll want to implement it in the API instead
2020-08-06 21956, 2020
alastairp
jmp_music_: sure, for now if it takes too long with sklearn then we should remove it to make the training process faster
2020-08-06 21957, 2020
Mr_Monkey
as an argument for the API endpoint*
2020-08-06 21959, 2020
jmp_music_
also the mfcc step never returns good results
2020-08-06 21915, 2020
alastairp
if we have time, we should look into it in more detail, because obviously gaia is doing something with that data
2020-08-06 21918, 2020
ruaok
Mr_Monkey: but the idea is that the concept of looking for more listens completely disappears from the javascript/html side of things, right?
2020-08-06 21936, 2020
jmp_music_
cool
2020-08-06 21901, 2020
Mr_Monkey
No. On the contrary, we let the front-end decide when it should request a bigger time_range from the API.
2020-08-06 21901, 2020
slriv has quit
2020-08-06 21929, 2020
ruaok
ok, now I am really confused.
2020-08-06 21931, 2020
jmp_music_
@alastairp: thanks a lot! I 'll start working on the evaluate scripts
2020-08-06 21947, 2020
Mr_Monkey
That way, we don't make the initial call to load the user page longer by looking for bigger time range, and only do so when the user decides to (click a button to load more)
2020-08-06 21950, 2020
slriv joined the channel
2020-08-06 21909, 2020
alastairp
jmp_music_: great, I'll try and get these interface changes made soon so that you can extend this work. let me know if you have any questions or need me to do something
2020-08-06 21918, 2020
jmp_music_
of course! thanks again. Oh, one final question. Do I have to make a readme file inside the folder of the sklearn tool, in the purposes of the GSoC? I just saw an email they sent to me earlier toaday
2020-08-06 21918, 2020
slriv has quit
2020-08-06 21922, 2020
jmp_music_
today*
2020-08-06 21903, 2020
slriv joined the channel
2020-08-06 21949, 2020
ruaok
Mr_Monkey: which branch should I be looking at? I think I need to see what the current code is that we're going for.
2020-08-06 21951, 2020
Mr_Monkey
So we previously used to load the default time range, and passed a flag to react if there were more listens to fetch. User could then click a button to reload the page with a flag to load a bigger time_range. All we want to do now is move that mechanism to the front-end only, but instead of reloading the page we call the API (which is where a time_range arg is currently missing)
2020-08-06 21904, 2020
Mr_Monkey
master
2020-08-06 21926, 2020
ruaok
ok, so the task list is: remove the time_range arguments from the profile view and instead add a time_range argument to the API. do I have that right then?
2020-08-06 21935, 2020
Mr_Monkey
Yes
2020-08-06 21912, 2020
ruaok
ok, thanks for clarifying.
2020-08-06 21954, 2020
alastairp
jmp_music_: yes, that was in fact one of the things that I was going to add to the review of your PR
2020-08-06 21919, 2020
alastairp
it would be good to have a readme in that folder that briefly explains what the module does, along with some examples of how to use each part
2020-08-06 21940, 2020
Mr_Monkey
In the same way we decided to fetch the listens count after page load to not make the page load longer than necessary, we figured it would be better to let the user decide to load more data if they wish. The alternative is making page loads potentially longer for user's profiles you might be visiting, but only care to see the latest listens or stats, but don't necessarily care about having a full page of listens)
2020-08-06 21953, 2020
ruaok
ok totally makes sense. I had understood something completely differently when it was explained at first. hence my confusion.
2020-08-06 21908, 2020
ruaok
but with the dynamic loading it makes sense.
2020-08-06 21936, 2020
ruaok
heh, the train crew is rather sassy. making all sorts of comments about how not to wear a mask. lol
2020-08-06 21945, 2020
jmp_music_
@alastairp: OK, I'll be waiting for the review
2020-08-06 21952, 2020
Mr_Monkey
👍
2020-08-06 21954, 2020
jmp_music_
thanks again :)
2020-08-06 21905, 2020
Mr_Monkey
All aboard the sassy train!
2020-08-06 21904, 2020
ruaok
Mr_Monkey: the prop for the user view will no longer receive search_larger_time_range, can you confirm that.
2020-08-06 21945, 2020
Mr_Monkey
Confirmed
2020-08-06 21904, 2020
ruaok
k. what is the name of the argument to the API that specifies range?
2020-08-06 21945, 2020
Mr_Monkey
In fact that's already been removed on the React side
2020-08-06 21901, 2020
Mr_Monkey
There is currently no argument for range on the API
2020-08-06 21906, 2020
ruaok
and also, search_larger_time_range will no longer be a valid option for the profile view, right?
2020-08-06 21926, 2020
Mr_Monkey
Correct
2020-08-06 21931, 2020
white_shadow joined the channel
2020-08-06 21924, 2020
ruaok
gah! we're currently passing through the 100km section where its nothing but tunnel-bridge-tunnel-bridge connectivity sucks!
2020-08-06 21938, 2020
ruaok
try the move-range-to-api branch -- does that look like what you are expecting?