I'm remembering some of it. By default we search listens for 3 ranges (15 days), with no assurance that there aren't any older ones. So basically if I request 25 listens but there is a month gap between two listens, I will not get the whole lot returned.
2020-07-30 21234, 2020
shivam-kapila
yep
2020-07-30 21214, 2020
Mr_Monkey
Yeah, I followed the mechanism up to timescale, which is the part I don't understand much. But I guess it makes sense. So as discussed, there's probably an argument missing for the API endpoint.
2020-07-30 21222, 2020
Mr_Monkey
On the front-end side, I'll need to do the listens count comparison (provided I'm not on the last page [that might have less than $count listens]) and call the API again (automatically, ideally) with the extra arg.
2020-07-30 21238, 2020
shivam-kapila
yes
2020-07-30 21241, 2020
shivam-kapila
tha would do
2020-07-30 21246, 2020
shivam-kapila
that*
2020-07-30 21258, 2020
Mr_Monkey
OK, makes sense. Thanks for your help refreshing my memory :)
2020-07-30 21202, 2020
shivam-kapila
we may also add the count check in api
2020-07-30 21208, 2020
shivam-kapila
one more thing
2020-07-30 21220, 2020
Mr_Monkey
Hm. That would probably be better on the API direectly
2020-07-30 21202, 2020
shivam-kapila
Can we make the user/<user-name> not fetch the listens even the first time and make a call from frontend itself
2020-07-30 21207, 2020
Mr_Monkey
Why so? Best practice would be to serve the results with the page and save an extra call from the frontend.
2020-07-30 21207, 2020
shivam-kapila
ALso we do limit the results
2020-07-30 21242, 2020
shivam-kapila
I just see that most of the newer services serve a template and then make the calls from frontend
So if you want to that all pages have almost 25 listens in each case for consistency, we may do so
2020-07-30 21200, 2020
ruaok
Danke!
2020-07-30 21217, 2020
shivam-kapila
lol I am searching the meanings
2020-07-30 21225, 2020
alastairp
ishaanshah: do you have some time to talk about a few things in the hdfs uploader?
2020-07-30 21231, 2020
alastairp
jmp_music_: hi, I'm here. how are you?
2020-07-30 21200, 2020
alastairp
ruaok: you do know that Mr_Monkey and I make a beer with a ship on the label too? You could have just got some from here
2020-07-30 21214, 2020
ishaanshah
alastairp: sure, give me 5 mins
2020-07-30 21246, 2020
ruaok
You mean my whole escape was for naught??
2020-07-30 21204, 2020
alastairp
if your whole escape was to find beer with a ship on the label, then yes
2020-07-30 21226, 2020
ruaok
Crap!
2020-07-30 21252, 2020
jmp_music_
@alastairp: Hey! I'm fine! I made some changes over the last days
2020-07-30 21256, 2020
ishaanshah
alastairp: Hey I am up
2020-07-30 21210, 2020
jmp_music_
Finally everything works fine
2020-07-30 21213, 2020
ishaanshah
maybe after your meeting with jmp_music_ ?
2020-07-30 21222, 2020
shivam-kapila
busy alastairp
2020-07-30 21251, 2020
jmp_music_
@alastairp: Do you want to make a short meeting later today to inform you about the updates?
2020-07-30 21244, 2020
alastairp
jmp_music_: let's do it now
2020-07-30 21200, 2020
jmp_music_
great
2020-07-30 21232, 2020
jmp_music_
well, I finally made every transformation with Pipelines and the prediction issues are solved
2020-07-30 21251, 2020
jmp_music_
thus the code is shorten up a lot
2020-07-30 21225, 2020
alastairp
that's great. so we're probably in a position where the new models are basically a drop-in replacement for the existing ones?
2020-07-30 21233, 2020
jmp_music_
exactly
2020-07-30 21234, 2020
alastairp
do you know what the issue was with the prediction?
2020-07-30 21258, 2020
jmp_music_
the `random` library again. There were two shuffled processes in the past. One for the tracks (which were in a list), and one for the labels, which were included in a pandas series
2020-07-30 21224, 2020
jmp_music_
now everything works properly because I do the whole shuffling in the start
2020-07-30 21231, 2020
jmp_music_
and then I split the labels from the tracks
2020-07-30 21232, 2020
alastairp
cool
2020-07-30 21249, 2020
alastairp
so it was actually returning results for a different item?
2020-07-30 21200, 2020
jmp_music_
yeap
2020-07-30 21220, 2020
alastairp
whoops. good thing that we caught that
2020-07-30 21230, 2020
jmp_music_
I think so :)
2020-07-30 21221, 2020
jmp_music_
Furthermore, now there is project template, and for each classification problem a different classification config yaml is created
2020-07-30 21247, 2020
alastairp
what do you think is the next step in the project, then?
2020-07-30 21225, 2020
jmp_music_
I want just to finish some logging now, and then proceed to the integration with the AB
2020-07-30 21225, 2020
alastairp
looking at your proposal, we had the integration of the new models into the rest of acousticbrainz?
2020-07-30 21240, 2020
alastairp
great. once you've finished with the logging can you make a pull request on the acousticbrainz-server repository to add the code?
2020-07-30 21201, 2020
jmp_music_
yes of course
2020-07-30 21206, 2020
alastairp
let's make a new package for it. Perhaps `acousticbrainz.models`
2020-07-30 21214, 2020
jmp_music_
ok!
2020-07-30 21234, 2020
alastairp
we don't have an `acousticbrainz` package at the moment, but we want to move stuff into it eventually, so we could make this as the first thing that uses it
2020-07-30 21207, 2020
alastairp
thinking into the future, let's add the sklearn stuff into a `sklearn` submodule, so that if we have other libraries (tensorflow, etc), we can put them in there as well
2020-07-30 21215, 2020
jmp_music_
Do you think that we could make it as an `acousticbrainz` library?
2020-07-30 21228, 2020
alastairp
as something that is installable with pip?
2020-07-30 21235, 2020
jmp_music_
yes
2020-07-30 21242, 2020
alastairp
I don't think that's important at the moment
2020-07-30 21224, 2020
jmp_music_
thus I have to transfer the whole code in the AB repository?
2020-07-30 21229, 2020
jmp_music_
am i right?
2020-07-30 21231, 2020
alastairp
yes.
2020-07-30 21241, 2020
alastairp
we normally keep all code for each project in the same repository
we have a script that runs that looks at the `lowlevel` database table and the `highlevel` database table
2020-07-30 21258, 2020
jmp_music_
ok! So I have to replace gaia over there
2020-07-30 21222, 2020
alastairp
if there is no item in the highlevel table for a specific row in the lowlevel table, we get the lowlevel data from the database, perform the prediction, and then write the highlevel data
when you have built a dataset, we have a button called "Evaluate", which submits it to have a model trained with gaia
2020-07-30 21204, 2020
alastairp
I would like to set up a complete end-to-end pipeline that allows us to build a dataset, construct a model with sklearn, perform an evaluation with a separate subset of the acousticbrainz database, and then finally promote a model as live if we decide that it works well, so that it shows on the website and is available in the API
2020-07-30 21240, 2020
jmp_music_
however build dataset evaluations are not the models that are used for the predictions of the high-level, right?
2020-07-30 21257, 2020
alastairp
it would be great to be able to do this completely through the website. almost all of these components exist as individual parts, I think that now would be a great time to integrate them together
2020-07-30 21214, 2020
jmp_music_
I undestand
2020-07-30 21228, 2020
alastairp
I sent you our paper about cross-collection evaluation, right?
2020-07-30 21204, 2020
jmp_music_
yes yes
2020-07-30 21233, 2020
alastairp
at the moment we have an accuracy of the model made with sklearn, using cross-evaluation train/test splits
2020-07-30 21253, 2020
jmp_music_
right
2020-07-30 21205, 2020
alastairp
however we would like to also calculate a second accuracy, using a second dataset
2020-07-30 21218, 2020
alastairp
for example, you and I both make a dataset for electronic/not electronic
2020-07-30 21238, 2020
alastairp
you make a model with your dataset and you get 89% accuracy
2020-07-30 21229, 2020
alastairp
then you use your model to compute predictions on the items in my dataset, and see how many of the predictions match my ground-truth
2020-07-30 21248, 2020
jmp_music_
aha, i understand
2020-07-30 21211, 2020
alastairp
we actually have this functionality. it's called dataset contests, however it's not fully merged
2020-07-30 21219, 2020
alastairp
I will work on merging it in the next few weeks
2020-07-30 21235, 2020
alastairp
but the idea would be to modify this existing code so that it works with either gaia or sklearn
see the highlevel_model table. This is the prediction for a single model. So for 1 lowlevel item, we will have 1 highlevel item, and 18 highlevel_model items (one for each model)
2020-07-30 21232, 2020
jmp_music_
aha ok!
2020-07-30 21203, 2020
jmp_music_
I think I understand
2020-07-30 21216, 2020
alastairp
so when we add a new row to the model table, we should have a script which can find all of the lowlevel items that don't have a prediction for that model, and then compute the prediction, and add a row to the highlevel_table table
2020-07-30 21226, 2020
alastairp
then this data will appear on the API
2020-07-30 21228, 2020
jmp_music_
ok
2020-07-30 21242, 2020
jmp_music_
Can I ask something
2020-07-30 21242, 2020
jmp_music_
?
2020-07-30 21248, 2020
alastairp
OK, that's the whole overview. I'm not sure if we will have enough time to finish it this summer, but I wanted you to know the full cycle
2020-07-30 21251, 2020
alastairp
absolutely
2020-07-30 21221, 2020
jmp_music_
Where should I save the .pkl models of the transformation pipelines?