in #metabrainz

12:03 PM
jmp_music_

cool!
12:03 PM
alastairp

so we can provide users with an option to choose which method they use, we can save that to options, and then in the method that you linked to, we can look at the option and choose the correct training process
12:05 PM
jmp_music_

do you think it is proper to created a new function that works with the sklearn, like evaluate_dataset_sklearn()?
12:05 PM
create*
12:05 PM
alastairp

some of the code here would be shared, though
12:06 PM
jmp_music_

ok
12:06 PM
alastairp

I think that we should move this into a separate method for gaia: https://github.com/metabrainz/acousticbrainz-se...
12:06 PM
and make another similar one for sklearn
12:06 PM
jmp_music_

that was what i was thinking too.
12:06 PM
slriv has quit
12:06 PM
let's say evaluate_sklearn.py?
12:07 PM
a separate python file
12:07 PM
?
12:07 PM
v6lur joined the channel
12:07 PM
alastairp

mmm, so gaia_wrapper.train_model is in a separate file
12:07 PM
slriv joined the channel
12:07 PM
I'm just thinking
12:08 PM
what's the main entrypoint to your train method for sklearn?
12:08 PM
jmp_music_

exaclty. We could make a sklearn_wrapper
12:08 PM
alastairp

keep in mind that the gaia wrapper only exists because the interface to gaia was a little bit verbose
12:09 PM
jmp_music_

it's the file called `create_classification_project.py`
12:09 PM
we can load it directly
12:09 PM
alastairp

because we control all of the code for the sklearn model training process, we should be able to just include a single method there that we can call
12:10 PM
white_snack has quit
12:10 PM
in fact, all we need is a single method like `train_model` in the code that you've already added, and a new method like `save_history_file` which saves the sklearn model
12:10 PM
train_model can be in create_classification_project or similar
12:10 PM
jmp_music_

great. Thus, a separate method inside the evaluate.py would be preferable?
12:10 PM
alastairp

save_sklearn_model (for example) could be be in evaluate.py
12:10 PM
white_snack joined the channel
12:11 PM
for now I'm not too worried if we put the method in evaluate.py or create_classification_project.py, we can change it if we think it should be moved
12:11 PM
jmp_music_

great. That was what I was thinking too
12:11 PM
Mr_Monkey

Hi ruaok! Were you able to have a look at adding a time_range parameter to the user/XXX/listens API endpoint?
12:11 PM
jmp_music_

Too more questions
12:12 PM
two*
12:12 PM
alastairp

this should be a good start. once this is done, we will be able to build a dataset and train its model with sklearn! that's the second part of closing the loop
12:12 PM
ruaok

Mr_Monkey: hi! sorry, that hasn't made to the top of the list yet.
12:12 PM
Mr_Monkey

Some of the functionality I'll want to implement on the React side (if there are less listens than the expected count and my oldest ts is older than tha, fetch listens again with a bigger time_range)
12:13 PM
jmp_music_

I saw in the create dataset from AB API, that I can download a csv file, which contains the MBIDs from a dataset for evaluation
12:13 PM
ruaok

but, I have 4 hours left, and not that much work I can do with this shit connection.
12:13 PM
Mr_Monkey

No problem. I don't think this is a huge rsh, but I do wonder what it's looking like for users at the moment.
12:13 PM
ruaok

I might just be able to knock that out.
12:13 PM
jmp_music_

how could I load to localhost AB the relevant low-level
12:13 PM
in order to experiment later on with the evaluation
12:13 PM
?
12:17 PM
ruaok

Mr_Monkey: I'll do it now. should I make a new branch or add to an existing branch?
12:17 PM
alastairp

ahhh
12:17 PM
that's https://github.com/metabrainz/acousticbrainz-se... :)
12:17 PM
slriv has quit
12:17 PM
do you need a bunch of files to load to localhost for testing?
12:18 PM
slriv joined the channel
12:18 PM
ruaok

Mr_Monkey: > my oldest ts is older than tha,
12:18 PM
can you expand on that, please?
12:18 PM
alastairp

https://zenodo.org/record/2553414 you could download one of these archives, they have a lot of files in them that you could then upload to the server
12:18 PM
jmp_music_

@alastairp: exactly
12:19 PM
alastairp

https://usercontent.irccloud-cdn.com/file/IMaTS...
12:19 PM
I keep around this basic submit script that will take a directory of files named [uuid].json and submit them to a local server
12:20 PM
jmp_music_

great! This would be really helpful
12:20 PM
Mr_Monkey

What I mean is that the front end needs to determine wether it should fetch more listens or not. If I get less than the 25 listens i fetched for, I'll compare the ts for the oldest of those with the oldest ts for that user, which I have access to. If it's higher than the oldest ts that means there are more listens to fetch.
12:20 PM
slriv has quit
12:21 PM
jmp_music_

I thought if could be any script that can read the MBIDs from a dataset csv file
12:21 PM
https://www.irccloud.com/pastebin/uIiA4IOq/data...
12:21 PM
slriv joined the channel
12:21 PM
like this one
12:21 PM
ruaok

Mr_Monkey: ok, I think I understand.
12:21 PM
Mr_Monkey

Basically the same check we were doing in python (/me looks for the line)
12:21 PM
alastairp

and then download the data from acousticbrainz and then submit to a local server?
12:21 PM
jmp_music_

yeah
12:21 PM
alastairp

that should be a very small modification to the above file
12:21 PM
jmp_music_

ok, I'll check it
12:22 PM
my final question is related to the sklearn, and gaia processsing steps
12:22 PM
I see that mfcc preprocessing step is really slow during the training of the gridsearch model
12:22 PM
ruaok

Mr_Monkey: so, we will no longer be passing in the time_range function to the listenstore, yes?
12:22 PM
slriv has quit
12:23 PM
jmp_music_

and this step was also excluded from the PR you sent me above
12:23 PM
Mr_Monkey

ruaok: https://github.com/metabrainz/listenbrainz-serv...
12:23 PM
Although it doesn't look like we were comparing timestamps at all
12:23 PM
ruaok

makes sense to
12:23 PM
slriv joined the channel
12:24 PM
Mr_Monkey

As for time_range and listenstore, I'm not sure. Not in user.py I don't think
12:24 PM
jmp_music_

@alastairp: could we remove it from the pre-processing steps? I started a train for a really large dataset in a computer that has 48 cores, wears a CUDA GPU of 7000 dollars and it runs two days in row 😂
12:24 PM
Mr_Monkey

But we'll want to implement it in the API instead
12:24 PM
alastairp

jmp_music_: sure, for now if it takes too long with sklearn then we should remove it to make the training process faster
12:24 PM
Mr_Monkey

as an argument for the API endpoint*
12:24 PM
jmp_music_

also the mfcc step never returns good results
12:25 PM
alastairp

if we have time, we should look into it in more detail, because obviously gaia is doing something with that data
12:25 PM
ruaok

Mr_Monkey: but the idea is that the concept of looking for more listens completely disappears from the javascript/html side of things, right?
12:25 PM
jmp_music_

cool
12:26 PM
Mr_Monkey

No. On the contrary, we let the front-end decide when it should request a bigger time_range from the API.
12:26 PM
slriv has quit
12:26 PM
ruaok

ok, now I am really confused.
12:26 PM
jmp_music_

@alastairp: thanks a lot! I 'll start working on the evaluate scripts
12:26 PM
Mr_Monkey

That way, we don't make the initial call to load the user page longer by looking for bigger time range, and only do so when the user decides to (click a button to load more)
12:26 PM
slriv joined the channel
12:27 PM
alastairp

jmp_music_: great, I'll try and get these interface changes made soon so that you can extend this work. let me know if you have any questions or need me to do something
12:28 PM
jmp_music_

of course! thanks again. Oh, one final question. Do I have to make a readme file inside the folder of the sklearn tool, in the purposes of the GSoC? I just saw an email they sent to me earlier toaday
12:28 PM
slriv has quit
12:28 PM
today*
12:29 PM
slriv joined the channel
12:29 PM
ruaok

Mr_Monkey: which branch should I be looking at? I think I need to see what the current code is that we're going for.
12:29 PM
Mr_Monkey

So we previously used to load the default time range, and passed a flag to react if there were more listens to fetch. User could then click a button to reload the page with a flag to load a bigger time_range. All we want to do now is move that mechanism to the front-end only, but instead of reloading the page we call the API (which is where a time_range arg is currently missing)
12:30 PM
master
12:32 PM
ruaok

ok, so the task list is: remove the time_range arguments from the profile view and instead add a time_range argument to the API. do I have that right then?
12:32 PM
Mr_Monkey

Yes
12:33 PM
ruaok

ok, thanks for clarifying.
12:34 PM
alastairp

jmp_music_: yes, that was in fact one of the things that I was going to add to the review of your PR
12:35 PM
it would be good to have a readme in that folder that briefly explains what the module does, along with some examples of how to use each part
12:35 PM
Mr_Monkey

In the same way we decided to fetch the listens count after page load to not make the page load longer than necessary, we figured it would be better to let the user decide to load more data if they wish. The alternative is making page loads potentially longer for user's profiles you might be visiting, but only care to see the latest listens or stats, but don't necessarily care about having a full page of listens)
12:36 PM
ruaok

ok totally makes sense. I had understood something completely differently when it was explained at first. hence my confusion.
12:37 PM
but with the dynamic loading it makes sense.
12:38 PM
heh, the train crew is rather sassy. making all sorts of comments about how not to wear a mask. lol
12:38 PM
jmp_music_

@alastairp: OK, I'll be waiting for the review
12:38 PM
Mr_Monkey

👍
12:38 PM
jmp_music_

thanks again :)
12:39 PM
Mr_Monkey

All aboard the sassy train!
12:40 PM
ruaok

Mr_Monkey: the prop for the user view will no longer receive search_larger_time_range, can you confirm that.
12:40 PM
Mr_Monkey

Confirmed
12:41 PM
ruaok

k. what is the name of the argument to the API that specifies range?
12:41 PM
Mr_Monkey

In fact that's already been removed on the React side
12:42 PM
There is currently no argument for range on the API
12:42 PM
ruaok

and also, search_larger_time_range will no longer be a valid option for the profile view, right?
12:42 PM
Mr_Monkey

Correct
12:57 PM
white_shadow joined the channel
13:00 PM
ruaok

gah! we're currently passing through the 100km section where its nothing but tunnel-bridge-tunnel-bridge connectivity sucks!
13:00 PM
try the move-range-to-api branch -- does that look like what you are expecting?
13:01 PM
white_snack has quit
13:01 PM
slriv has quit
13:01 PM
BrainzGit

[listenbrainz-server] mayhem opened pull request #1015 (master…move-range-to-api): Move range to api https://github.com/metabrainz/listenbrainz-serv...
13:02 PM
white_snack joined the channel
13:02 PM
slriv joined the channel
13:06 PM
white_shadow has quit
13:06 PM
slriv has quit
13:06 PM
slriv joined the channel
13:29 PM
shivam-kapila

Mr_Monkey: hey
13:29 PM
slriv has quit
13:29 PM
Mr_Monkey

Halo
13:29 PM
shivam-kapila

I wanted to ask something for search_larger_time_range
13:29 PM
Mr_Monkey

Shoot
13:30 PM
slriv joined the channel
13:30 PM
shivam-kapila

So currently what user feedback we got was to remove the trouble of clicking a button
13:31 PM
Rather the users wanted to wait for 2 sec more
13:31 PM
Mr_Monkey

ruaok: looks pretty good to me. I'll check
13:31 PM
shivam-kapila

Rather having to click that button
13:31 PM
Mr_Monkey

Ah?
13:32 PM
I think that could work. Maybe show a special "loading more…" text
13:32 PM
shivam-kapila

I was also thinking to add a text below the loader
13:32 PM
I was woring on in 1000 PR
13:33 PM
ruaok

Mr_Monkey: I'm slowly fixing tests...
13:33 PM
shivam-kapila

Something user friendly and intrresting like... Please wait... Digging more into your listens
13:33 PM
So shall I extend the support for loader
13:33 PM
Mr_Monkey

I'd say we put that loading more text below the listens (considering we have less listens than usual in that case)