#metabrainz

/

      • jmp_music_
        cool!
      • alastairp
        so we can provide users with an option to choose which method they use, we can save that to options, and then in the method that you linked to, we can look at the option and choose the correct training process
      • jmp_music_
        do you think it is proper to created a new function that works with the sklearn, like evaluate_dataset_sklearn()?
      • create*
      • alastairp
        some of the code here would be shared, though
      • jmp_music_
        ok
      • alastairp
        I think that we should move this into a separate method for gaia: https://github.com/metabrainz/acousticbrainz-se...
      • and make another similar one for sklearn
      • jmp_music_
        that was what i was thinking too.
      • slriv has quit
      • let's say evaluate_sklearn.py?
      • a separate python file
      • ?
      • v6lur joined the channel
      • alastairp
        mmm, so gaia_wrapper.train_model is in a separate file
      • slriv joined the channel
      • I'm just thinking
      • what's the main entrypoint to your train method for sklearn?
      • jmp_music_
        exaclty. We could make a sklearn_wrapper
      • alastairp
        keep in mind that the gaia wrapper only exists because the interface to gaia was a little bit verbose
      • jmp_music_
        it's the file called `create_classification_project.py`
      • we can load it directly
      • alastairp
        because we control all of the code for the sklearn model training process, we should be able to just include a single method there that we can call
      • white_snack has quit
      • in fact, all we need is a single method like `train_model` in the code that you've already added, and a new method like `save_history_file` which saves the sklearn model
      • train_model can be in create_classification_project or similar
      • jmp_music_
        great. Thus, a separate method inside the evaluate.py would be preferable?
      • alastairp
        save_sklearn_model (for example) could be be in evaluate.py
      • white_snack joined the channel
      • for now I'm not too worried if we put the method in evaluate.py or create_classification_project.py, we can change it if we think it should be moved
      • jmp_music_
        great. That was what I was thinking too
      • Mr_Monkey
        Hi ruaok! Were you able to have a look at adding a time_range parameter to the user/XXX/listens API endpoint?
      • jmp_music_
        Too more questions
      • two*
      • alastairp
        this should be a good start. once this is done, we will be able to build a dataset and train its model with sklearn! that's the second part of closing the loop
      • ruaok
        Mr_Monkey: hi! sorry, that hasn't made to the top of the list yet.
      • Mr_Monkey
        Some of the functionality I'll want to implement on the React side (if there are less listens than the expected count and my oldest ts is older than tha, fetch listens again with a bigger time_range)
      • jmp_music_
        I saw in the create dataset from AB API, that I can download a csv file, which contains the MBIDs from a dataset for evaluation
      • ruaok
        but, I have 4 hours left, and not that much work I can do with this shit connection.
      • Mr_Monkey
        No problem. I don't think this is a huge rsh, but I do wonder what it's looking like for users at the moment.
      • ruaok
        I might just be able to knock that out.
      • jmp_music_
        how could I load to localhost AB the relevant low-level
      • in order to experiment later on with the evaluation
      • ?
      • ruaok
        Mr_Monkey: I'll do it now. should I make a new branch or add to an existing branch?
      • alastairp
        ahhh
      • slriv has quit
      • do you need a bunch of files to load to localhost for testing?
      • slriv joined the channel
      • ruaok
        Mr_Monkey: > my oldest ts is older than tha,
      • can you expand on that, please?
      • alastairp
        https://zenodo.org/record/2553414 you could download one of these archives, they have a lot of files in them that you could then upload to the server
      • jmp_music_
        @alastairp: exactly
      • alastairp
      • I keep around this basic submit script that will take a directory of files named [uuid].json and submit them to a local server
      • jmp_music_
        great! This would be really helpful
      • Mr_Monkey
        What I mean is that the front end needs to determine wether it should fetch more listens or not. If I get less than the 25 listens i fetched for, I'll compare the ts for the oldest of those with the oldest ts for that user, which I have access to. If it's higher than the oldest ts that means there are more listens to fetch.
      • slriv has quit
      • jmp_music_
        I thought if could be any script that can read the MBIDs from a dataset csv file
      • slriv joined the channel
      • like this one
      • ruaok
        Mr_Monkey: ok, I think I understand.
      • Mr_Monkey
        Basically the same check we were doing in python (/me looks for the line)
      • alastairp
        and then download the data from acousticbrainz and then submit to a local server?
      • jmp_music_
        yeah
      • alastairp
        that should be a very small modification to the above file
      • jmp_music_
        ok, I'll check it
      • my final question is related to the sklearn, and gaia processsing steps
      • I see that mfcc preprocessing step is really slow during the training of the gridsearch model
      • ruaok
        Mr_Monkey: so, we will no longer be passing in the time_range function to the listenstore, yes?
      • slriv has quit
      • jmp_music_
        and this step was also excluded from the PR you sent me above
      • Mr_Monkey
      • Although it doesn't look like we were comparing timestamps at all
      • ruaok
        makes sense to
      • slriv joined the channel
      • Mr_Monkey
        As for time_range and listenstore, I'm not sure. Not in user.py I don't think
      • jmp_music_
        @alastairp: could we remove it from the pre-processing steps? I started a train for a really large dataset in a computer that has 48 cores, wears a CUDA GPU of 7000 dollars and it runs two days in row 😂
      • Mr_Monkey
        But we'll want to implement it in the API instead
      • alastairp
        jmp_music_: sure, for now if it takes too long with sklearn then we should remove it to make the training process faster
      • Mr_Monkey
        as an argument for the API endpoint*
      • jmp_music_
        also the mfcc step never returns good results
      • alastairp
        if we have time, we should look into it in more detail, because obviously gaia is doing something with that data
      • ruaok
        Mr_Monkey: but the idea is that the concept of looking for more listens completely disappears from the javascript/html side of things, right?
      • jmp_music_
        cool
      • Mr_Monkey
        No. On the contrary, we let the front-end decide when it should request a bigger time_range from the API.
      • slriv has quit
      • ruaok
        ok, now I am really confused.
      • jmp_music_
        @alastairp: thanks a lot! I 'll start working on the evaluate scripts
      • Mr_Monkey
        That way, we don't make the initial call to load the user page longer by looking for bigger time range, and only do so when the user decides to (click a button to load more)
      • slriv joined the channel
      • alastairp
        jmp_music_: great, I'll try and get these interface changes made soon so that you can extend this work. let me know if you have any questions or need me to do something
      • jmp_music_
        of course! thanks again. Oh, one final question. Do I have to make a readme file inside the folder of the sklearn tool, in the purposes of the GSoC? I just saw an email they sent to me earlier toaday
      • slriv has quit
      • today*
      • slriv joined the channel
      • ruaok
        Mr_Monkey: which branch should I be looking at? I think I need to see what the current code is that we're going for.
      • Mr_Monkey
        So we previously used to load the default time range, and passed a flag to react if there were more listens to fetch. User could then click a button to reload the page with a flag to load a bigger time_range. All we want to do now is move that mechanism to the front-end only, but instead of reloading the page we call the API (which is where a time_range arg is currently missing)
      • master
      • ruaok
        ok, so the task list is: remove the time_range arguments from the profile view and instead add a time_range argument to the API. do I have that right then?
      • Mr_Monkey
        Yes
      • ruaok
        ok, thanks for clarifying.
      • alastairp
        jmp_music_: yes, that was in fact one of the things that I was going to add to the review of your PR
      • it would be good to have a readme in that folder that briefly explains what the module does, along with some examples of how to use each part
      • Mr_Monkey
        In the same way we decided to fetch the listens count after page load to not make the page load longer than necessary, we figured it would be better to let the user decide to load more data if they wish. The alternative is making page loads potentially longer for user's profiles you might be visiting, but only care to see the latest listens or stats, but don't necessarily care about having a full page of listens)
      • ruaok
        ok totally makes sense. I had understood something completely differently when it was explained at first. hence my confusion.
      • but with the dynamic loading it makes sense.
      • heh, the train crew is rather sassy. making all sorts of comments about how not to wear a mask. lol
      • jmp_music_
        @alastairp: OK, I'll be waiting for the review
      • Mr_Monkey
        👍
      • jmp_music_
        thanks again :)
      • Mr_Monkey
        All aboard the sassy train!
      • ruaok
        Mr_Monkey: the prop for the user view will no longer receive search_larger_time_range, can you confirm that.
      • Mr_Monkey
        Confirmed
      • ruaok
        k. what is the name of the argument to the API that specifies range?
      • Mr_Monkey
        In fact that's already been removed on the React side
      • There is currently no argument for range on the API
      • ruaok
        and also, search_larger_time_range will no longer be a valid option for the profile view, right?
      • Mr_Monkey
        Correct
      • white_shadow joined the channel
      • ruaok
        gah! we're currently passing through the 100km section where its nothing but tunnel-bridge-tunnel-bridge connectivity sucks!
      • try the move-range-to-api branch -- does that look like what you are expecting?
      • white_snack has quit
      • slriv has quit
      • BrainzGit
        [listenbrainz-server] mayhem opened pull request #1015 (master…move-range-to-api): Move range to api https://github.com/metabrainz/listenbrainz-serv...
      • white_snack joined the channel
      • slriv joined the channel
      • white_shadow has quit
      • slriv has quit
      • slriv joined the channel
      • shivam-kapila
        Mr_Monkey: hey
      • slriv has quit
      • Mr_Monkey
        Halo
      • shivam-kapila
        I wanted to ask something for search_larger_time_range
      • Mr_Monkey
        Shoot
      • slriv joined the channel
      • shivam-kapila
        So currently what user feedback we got was to remove the trouble of clicking a button
      • Rather the users wanted to wait for 2 sec more
      • Mr_Monkey
        ruaok: looks pretty good to me. I'll check
      • shivam-kapila
        Rather having to click that button
      • Mr_Monkey
        Ah?
      • I think that could work. Maybe show a special "loading more…" text
      • shivam-kapila
        I was also thinking to add a text below the loader
      • I was woring on in 1000 PR
      • ruaok
        Mr_Monkey: I'm slowly fixing tests...
      • shivam-kapila
        Something user friendly and intrresting like... Please wait... Digging more into your listens
      • So shall I extend the support for loader
      • Mr_Monkey
        I'd say we put that loading more text below the listens (considering we have less listens than usual in that case)