#metabrainz

/

12:03 PM
jmp_music_

cool!

2020-08-06 21955, 2020

12:03 PM
alastairp

so we can provide users with an option to choose which method they use, we can save that to options, and then in the method that you linked to, we can look at the option and choose the correct training process

2020-08-06 21916, 2020

12:05 PM
jmp_music_

do you think it is proper to created a new function that works with the sklearn, like evaluate_dataset_sklearn()?

2020-08-06 21934, 2020

12:05 PM
jmp_music_

create*

2020-08-06 21952, 2020

12:05 PM
alastairp

some of the code here would be shared, though

2020-08-06 21906, 2020

12:06 PM
jmp_music_

ok

2020-08-06 21910, 2020

12:06 PM
alastairp

I think that we should move this into a separate method for gaia: https://github.com/metabrainz/acousticbrainz-serv…

2020-08-06 21915, 2020

12:06 PM
alastairp

and make another similar one for sklearn

2020-08-06 21946, 2020

12:06 PM
jmp_music_

that was what i was thinking too.

2020-08-06 21946, 2020

12:06 PM
slriv has quit

2020-08-06 21956, 2020

12:06 PM
jmp_music_

let's say evaluate_sklearn.py?

2020-08-06 21927, 2020

12:07 PM
jmp_music_

a separate python file

2020-08-06 21927, 2020

12:07 PM
jmp_music_

?

2020-08-06 21929, 2020

12:07 PM
v6lur joined the channel

2020-08-06 21949, 2020

12:07 PM
alastairp

mmm, so gaia_wrapper.train_model is in a separate file

2020-08-06 21952, 2020

12:07 PM
slriv joined the channel

2020-08-06 21953, 2020

12:07 PM
alastairp

I'm just thinking

2020-08-06 21914, 2020

12:08 PM
alastairp

what's the main entrypoint to your train method for sklearn?

2020-08-06 21915, 2020

12:08 PM
jmp_music_

exaclty. We could make a sklearn_wrapper

2020-08-06 21948, 2020

12:08 PM
alastairp

keep in mind that the gaia wrapper only exists because the interface to gaia was a little bit verbose

2020-08-06 21912, 2020

12:09 PM
jmp_music_

it's the file called `create_classification_project.py`

2020-08-06 21919, 2020

12:09 PM
jmp_music_

we can load it directly

2020-08-06 21927, 2020

12:09 PM
alastairp

because we control all of the code for the sklearn model training process, we should be able to just include a single method there that we can call

2020-08-06 21902, 2020

12:10 PM
white_snack has quit

2020-08-06 21908, 2020

12:10 PM
alastairp

in fact, all we need is a single method like `train_model` in the code that you've already added, and a new method like `save_history_file` which saves the sklearn model

2020-08-06 21923, 2020

12:10 PM
alastairp

train_model can be in create_classification_project or similar

2020-08-06 21930, 2020

12:10 PM
jmp_music_

great. Thus, a separate method inside the evaluate.py would be preferable?

2020-08-06 21940, 2020

12:10 PM
alastairp

save_sklearn_model (for example) could be be in evaluate.py

2020-08-06 21948, 2020

12:10 PM
white_snack joined the channel

2020-08-06 21911, 2020

12:11 PM
alastairp

for now I'm not too worried if we put the method in evaluate.py or create_classification_project.py, we can change it if we think it should be moved

2020-08-06 21930, 2020

12:11 PM
jmp_music_

great. That was what I was thinking too

2020-08-06 21933, 2020

12:11 PM
Mr_Monkey

Hi ruaok! Were you able to have a look at adding a time_range parameter to the user/XXX/listens API endpoint?

2020-08-06 21946, 2020

12:11 PM
jmp_music_

Too more questions

2020-08-06 21913, 2020

12:12 PM
jmp_music_

two*

2020-08-06 21920, 2020

12:12 PM
alastairp

this should be a good start. once this is done, we will be able to build a dataset and train its model with sklearn! that's the second part of closing the loop

2020-08-06 21951, 2020

12:12 PM
ruaok

Mr_Monkey: hi! sorry, that hasn't made to the top of the list yet.

2020-08-06 21954, 2020

12:12 PM
Mr_Monkey

Some of the functionality I'll want to implement on the React side (if there are less listens than the expected count and my oldest ts is older than tha, fetch listens again with a bigger time_range)

2020-08-06 21905, 2020

12:13 PM
jmp_music_

I saw in the create dataset from AB API, that I can download a csv file, which contains the MBIDs from a dataset for evaluation

2020-08-06 21915, 2020

12:13 PM
ruaok

but, I have 4 hours left, and not that much work I can do with this shit connection.

2020-08-06 21921, 2020

12:13 PM
Mr_Monkey

No problem. I don't think this is a huge rsh, but I do wonder what it's looking like for users at the moment.

2020-08-06 21923, 2020

12:13 PM
ruaok

I might just be able to knock that out.

2020-08-06 21937, 2020

12:13 PM
jmp_music_

how could I load to localhost AB the relevant low-level

2020-08-06 21948, 2020

12:13 PM
jmp_music_

in order to experiment later on with the evaluation

2020-08-06 21948, 2020

12:13 PM
jmp_music_

?

2020-08-06 21914, 2020

12:17 PM
ruaok

Mr_Monkey: I'll do it now. should I make a new branch or add to an existing branch?

2020-08-06 21926, 2020

12:17 PM
alastairp

ahhh

2020-08-06 21943, 2020

12:17 PM
alastairp

that's https://github.com/metabrainz/acousticbrainz-serv… :)

2020-08-06 21944, 2020

12:17 PM
slriv has quit

2020-08-06 21958, 2020

12:17 PM
alastairp

do you need a bunch of files to load to localhost for testing?

2020-08-06 21908, 2020

12:18 PM
slriv joined the channel

2020-08-06 21910, 2020

12:18 PM
ruaok

Mr_Monkey: > my oldest ts is older than tha,

2020-08-06 21925, 2020

12:18 PM
ruaok

can you expand on that, please?

2020-08-06 21937, 2020

12:18 PM
alastairp

https://zenodo.org/record/2553414 you could download one of these archives, they have a lot of files in them that you could then upload to the server

2020-08-06 21947, 2020

12:18 PM
jmp_music_

@alastairp: exactly

2020-08-06 21915, 2020

12:19 PM
alastairp

https://usercontent.irccloud-cdn.com/file/IMaTSke…

2020-08-06 21940, 2020

12:19 PM
alastairp

I keep around this basic submit script that will take a directory of files named [uuid].json and submit them to a local server

2020-08-06 21921, 2020

12:20 PM
jmp_music_

great! This would be really helpful

2020-08-06 21945, 2020

12:20 PM
Mr_Monkey

What I mean is that the front end needs to determine wether it should fetch more listens or not. If I get less than the 25 listens i fetched for, I'll compare the ts for the oldest of those with the oldest ts for that user, which I have access to. If it's higher than the oldest ts that means there are more listens to fetch.

2020-08-06 21945, 2020

12:20 PM
slriv has quit

2020-08-06 21904, 2020

12:21 PM
jmp_music_

I thought if could be any script that can read the MBIDs from a dataset csv file

2020-08-06 21915, 2020

12:21 PM
jmp_music_

https://www.irccloud.com/pastebin/uIiA4IOq/datase…

2020-08-06 21920, 2020

12:21 PM
slriv joined the channel

2020-08-06 21921, 2020

12:21 PM
jmp_music_

like this one

2020-08-06 21921, 2020

12:21 PM
ruaok

Mr_Monkey: ok, I think I understand.

2020-08-06 21924, 2020

12:21 PM
Mr_Monkey

Basically the same check we were doing in python (/me looks for the line)

2020-08-06 21928, 2020

12:21 PM
alastairp

and then download the data from acousticbrainz and then submit to a local server?

2020-08-06 21934, 2020

12:21 PM
jmp_music_

yeah

2020-08-06 21938, 2020

12:21 PM
alastairp

that should be a very small modification to the above file

2020-08-06 21947, 2020

12:21 PM
jmp_music_

ok, I'll check it

2020-08-06 21908, 2020

12:22 PM
jmp_music_

my final question is related to the sklearn, and gaia processsing steps

2020-08-06 21934, 2020

12:22 PM
jmp_music_

I see that mfcc preprocessing step is really slow during the training of the gridsearch model

2020-08-06 21952, 2020

12:22 PM
ruaok

Mr_Monkey: so, we will no longer be passing in the time_range function to the listenstore, yes?

2020-08-06 21952, 2020

12:22 PM
slriv has quit

2020-08-06 21901, 2020

12:23 PM
jmp_music_

and this step was also excluded from the PR you sent me above

2020-08-06 21904, 2020

12:23 PM
Mr_Monkey

ruaok: https://github.com/metabrainz/listenbrainz-server…

2020-08-06 21904, 2020

12:23 PM
Mr_Monkey

Although it doesn't look like we were comparing timestamps at all

2020-08-06 21932, 2020

12:23 PM
ruaok

makes sense to

2020-08-06 21933, 2020

12:23 PM
slriv joined the channel

2020-08-06 21926, 2020

12:24 PM
Mr_Monkey

As for time_range and listenstore, I'm not sure. Not in user.py I don't think

2020-08-06 21928, 2020

12:24 PM
jmp_music_

@alastairp: could we remove it from the pre-processing steps? I started a train for a really large dataset in a computer that has 48 cores, wears a CUDA GPU of 7000 dollars and it runs two days in row 😂

2020-08-06 21947, 2020

12:24 PM
Mr_Monkey

But we'll want to implement it in the API instead

2020-08-06 21956, 2020

12:24 PM
alastairp

jmp_music_: sure, for now if it takes too long with sklearn then we should remove it to make the training process faster

2020-08-06 21957, 2020

12:24 PM
Mr_Monkey

as an argument for the API endpoint*

2020-08-06 21959, 2020

12:24 PM
jmp_music_

also the mfcc step never returns good results

2020-08-06 21915, 2020

12:25 PM
alastairp

if we have time, we should look into it in more detail, because obviously gaia is doing something with that data

2020-08-06 21918, 2020

12:25 PM
ruaok

Mr_Monkey: but the idea is that the concept of looking for more listens completely disappears from the javascript/html side of things, right?

2020-08-06 21936, 2020

12:25 PM
jmp_music_

cool

2020-08-06 21901, 2020

12:26 PM
Mr_Monkey

No. On the contrary, we let the front-end decide when it should request a bigger time_range from the API.

2020-08-06 21901, 2020

12:26 PM
slriv has quit

2020-08-06 21929, 2020

12:26 PM
ruaok

ok, now I am really confused.

2020-08-06 21931, 2020

12:26 PM
jmp_music_

@alastairp: thanks a lot! I 'll start working on the evaluate scripts

2020-08-06 21947, 2020

12:26 PM
Mr_Monkey

That way, we don't make the initial call to load the user page longer by looking for bigger time range, and only do so when the user decides to (click a button to load more)

2020-08-06 21950, 2020

12:26 PM
slriv joined the channel

2020-08-06 21909, 2020

12:27 PM
alastairp

jmp_music_: great, I'll try and get these interface changes made soon so that you can extend this work. let me know if you have any questions or need me to do something

2020-08-06 21918, 2020

12:28 PM
jmp_music_

of course! thanks again. Oh, one final question. Do I have to make a readme file inside the folder of the sklearn tool, in the purposes of the GSoC? I just saw an email they sent to me earlier toaday

2020-08-06 21918, 2020

12:28 PM
slriv has quit

2020-08-06 21922, 2020

12:28 PM
jmp_music_

today*

2020-08-06 21903, 2020

12:29 PM
slriv joined the channel

2020-08-06 21949, 2020

12:29 PM
ruaok

Mr_Monkey: which branch should I be looking at? I think I need to see what the current code is that we're going for.

2020-08-06 21951, 2020

12:29 PM
Mr_Monkey

So we previously used to load the default time range, and passed a flag to react if there were more listens to fetch. User could then click a button to reload the page with a flag to load a bigger time_range. All we want to do now is move that mechanism to the front-end only, but instead of reloading the page we call the API (which is where a time_range arg is currently missing)

2020-08-06 21904, 2020

12:30 PM
Mr_Monkey

master

2020-08-06 21926, 2020

12:32 PM
ruaok

ok, so the task list is: remove the time_range arguments from the profile view and instead add a time_range argument to the API. do I have that right then?

2020-08-06 21935, 2020

12:32 PM
Mr_Monkey

Yes

2020-08-06 21912, 2020

12:33 PM
ruaok

ok, thanks for clarifying.

2020-08-06 21954, 2020

12:34 PM
alastairp

jmp_music_: yes, that was in fact one of the things that I was going to add to the review of your PR

2020-08-06 21919, 2020

12:35 PM
alastairp

it would be good to have a readme in that folder that briefly explains what the module does, along with some examples of how to use each part

2020-08-06 21940, 2020

12:35 PM
Mr_Monkey

In the same way we decided to fetch the listens count after page load to not make the page load longer than necessary, we figured it would be better to let the user decide to load more data if they wish. The alternative is making page loads potentially longer for user's profiles you might be visiting, but only care to see the latest listens or stats, but don't necessarily care about having a full page of listens)

2020-08-06 21953, 2020

12:36 PM
ruaok

ok totally makes sense. I had understood something completely differently when it was explained at first. hence my confusion.

2020-08-06 21908, 2020

12:37 PM
ruaok

but with the dynamic loading it makes sense.

2020-08-06 21936, 2020

12:38 PM
ruaok

heh, the train crew is rather sassy. making all sorts of comments about how not to wear a mask. lol

2020-08-06 21945, 2020

12:38 PM
jmp_music_

@alastairp: OK, I'll be waiting for the review

2020-08-06 21952, 2020

12:38 PM
Mr_Monkey

👍

2020-08-06 21954, 2020

12:38 PM
jmp_music_

thanks again :)

2020-08-06 21905, 2020

12:39 PM
Mr_Monkey

All aboard the sassy train!

2020-08-06 21904, 2020

12:40 PM
ruaok

Mr_Monkey: the prop for the user view will no longer receive search_larger_time_range, can you confirm that.

2020-08-06 21945, 2020

12:40 PM
Mr_Monkey

Confirmed

2020-08-06 21904, 2020

12:41 PM
ruaok

k. what is the name of the argument to the API that specifies range?

2020-08-06 21945, 2020

12:41 PM
Mr_Monkey

In fact that's already been removed on the React side

2020-08-06 21901, 2020

12:42 PM
Mr_Monkey

There is currently no argument for range on the API

2020-08-06 21906, 2020

12:42 PM
ruaok

and also, search_larger_time_range will no longer be a valid option for the profile view, right?

2020-08-06 21926, 2020

12:42 PM
Mr_Monkey

Correct

2020-08-06 21931, 2020

12:57 PM
white_shadow joined the channel

2020-08-06 21924, 2020

13:00 PM
ruaok

gah! we're currently passing through the 100km section where its nothing but tunnel-bridge-tunnel-bridge connectivity sucks!

2020-08-06 21938, 2020

13:00 PM
ruaok

try the move-range-to-api branch -- does that look like what you are expecting?

2020-08-06 21927, 2020

13:01 PM
white_snack has quit

2020-08-06 21927, 2020

13:01 PM
slriv has quit

2020-08-06 21936, 2020

13:01 PM
BrainzGit

[listenbrainz-server] mayhem opened pull request #1015 (master…move-range-to-api): Move range to api https://github.com/metabrainz/listenbrainz-server…

2020-08-06 21930, 2020

13:02 PM
white_snack joined the channel

2020-08-06 21935, 2020

13:02 PM
slriv joined the channel

2020-08-06 21911, 2020

13:06 PM
white_shadow has quit

2020-08-06 21911, 2020

13:06 PM
slriv has quit

2020-08-06 21950, 2020

13:06 PM
slriv joined the channel

2020-08-06 21924, 2020

13:29 PM
shivam-kapila

Mr_Monkey: hey

2020-08-06 21924, 2020

13:29 PM
slriv has quit

2020-08-06 21932, 2020

13:29 PM
Mr_Monkey

Halo

2020-08-06 21946, 2020

13:29 PM
shivam-kapila

I wanted to ask something for search_larger_time_range

2020-08-06 21952, 2020

13:29 PM
Mr_Monkey

Shoot

2020-08-06 21917, 2020

13:30 PM
slriv joined the channel

2020-08-06 21944, 2020

13:30 PM
shivam-kapila

So currently what user feedback we got was to remove the trouble of clicking a button

2020-08-06 21907, 2020

13:31 PM
shivam-kapila

Rather the users wanted to wait for 2 sec more

2020-08-06 21924, 2020

13:31 PM
Mr_Monkey

ruaok: looks pretty good to me. I'll check

2020-08-06 21924, 2020

13:31 PM
shivam-kapila

Rather having to click that button

2020-08-06 21938, 2020

13:31 PM
Mr_Monkey

Ah?

2020-08-06 21914, 2020

13:32 PM
Mr_Monkey

I think that could work. Maybe show a special "loading more…" text

2020-08-06 21942, 2020

13:32 PM
shivam-kapila

I was also thinking to add a text below the loader

2020-08-06 21950, 2020

13:32 PM
shivam-kapila

I was woring on in 1000 PR

2020-08-06 21905, 2020

13:33 PM
ruaok

Mr_Monkey: I'm slowly fixing tests...

2020-08-06 21926, 2020

13:33 PM
shivam-kapila

Something user friendly and intrresting like... Please wait... Digging more into your listens

2020-08-06 21940, 2020

13:33 PM
shivam-kapila

So shall I extend the support for loader

2020-08-06 21941, 2020

13:33 PM
Mr_Monkey

I'd say we put that loading more text below the listens (considering we have less listens than usual in that case)