is this an idea for a new welcome ot musicbrainz page?
2020-07-13 19520, 2020
CatQuest
for the wiki or the main site?
2020-07-13 19553, 2020
MFCR_ColbyRay
i was bored when i made this
2020-07-13 19559, 2020
CatQuest
it's very well done :D
2020-07-13 19501, 2020
CatQuest
cool!
2020-07-13 19515, 2020
CatQuest
i liek the "who uses musicrainz" section
2020-07-13 19533, 2020
CatQuest
damn, forgive my spellings
2020-07-13 19537, 2020
MFCR_ColbyRay
i simply combined the About page and MusicBrainz in a Nutshell
2020-07-13 19551, 2020
CatQuest
yea! it's apretty good :O
2020-07-13 19532, 2020
MFCR_ColbyRay has quit
2020-07-13 19538, 2020
CatQuest
if.. oh
2020-07-13 19530, 2020
CatQuest
I wanted to say that if he wanted a feedback and idea I had was to in the "who uses" section, the section saying "and more" could link to the appropriate segemtns on metabrainz
since this page is essentially a "promotion of mb"
2020-07-13 19503, 2020
CatQuest
which is ok, i think?
2020-07-13 19519, 2020
CatQuest
freso, reosarevok, rdswift ?
2020-07-13 19535, 2020
MFCR_ColbyRay has quit
2020-07-13 19546, 2020
CatQuest
eugh. maybe this is one of those things where i don't know shit :(
2020-07-13 19554, 2020
CatQuest
🤷
2020-07-13 19538, 2020
sumedh joined the channel
2020-07-13 19559, 2020
jmp_music
hey alastairp. I have figured out what was going on with the data that were loading from ground truth file. I fixed that issue. I started also following gaia's structure of development. I have some questions to do when you have time.
2020-07-13 19505, 2020
alastairp
hi jmp_music
2020-07-13 19517, 2020
alastairp
sorry, I completely missed responding to you about Friday
2020-07-13 19525, 2020
jmp_music
no worries!
2020-07-13 19538, 2020
alastairp
if that happens again, just mention me again - I was around, but had just forgotten
2020-07-13 19545, 2020
alastairp
what are your questions?
2020-07-13 19555, 2020
alastairp
I saw your equal accuracies, that's great!
2020-07-13 19516, 2020
jmp_music
no worries, I understand you are too busy with various things
2020-07-13 19546, 2020
jmp_music
Well some times the accuracies were equal to gaia but some other times were not
2020-07-13 19505, 2020
alastairp
that could very well be because of some unexpected randomness
2020-07-13 19512, 2020
alastairp
how many different datasets have you tried?
2020-07-13 19512, 2020
jmp_music
I figured out that the problem was in random
2020-07-13 19519, 2020
jmp_music
exactly!
2020-07-13 19552, 2020
jmp_music
When I loaded random.shuffle with a numpy array, this method duplicated the data
2020-07-13 19520, 2020
jmp_music
and some other they were disappearing from the dataset
2020-07-13 19549, 2020
jmp_music
but when I loaded the data from the gt as a simple list, the suffling was done successfully
2020-07-13 19529, 2020
alastairp
that sounds like a good thing to write a comment about, to make sure it doesn't happen again
2020-07-13 19551, 2020
jmp_music
of course. I 'll include it in my comments
2020-07-13 19558, 2020
jmp_music
inside the code
2020-07-13 19517, 2020
jmp_music
I dont know why the random library had that issue. I started re-factoring some things in my code, and I decided to follow gaia's structure
2020-07-13 19553, 2020
alastairp
what structure? The code, or the format of the data files?
2020-07-13 19547, 2020
jmp_music
something like a hybrid solution of these two. I have to borrow some parts of the structure (not the code itself), just the way the methods are called
2020-07-13 19502, 2020
alastairp
ok, cool
2020-07-13 19509, 2020
alastairp
how many datasets have you evaluated?
2020-07-13 19533, 2020
jmp_music
3 for now, but since yesterday I started the refactoring.
2020-07-13 19542, 2020
jmp_music
because of the issue of random library
2020-07-13 19501, 2020
jmp_music
another thing that I discovered is the grid search
2020-07-13 19504, 2020
alastairp
great, and after your refactoring, are all of the accuracies approximately the same?
2020-07-13 19519, 2020
jmp_music
I will hope so
2020-07-13 19526, 2020
jmp_music
I havent finished yet
2020-07-13 19549, 2020
jmp_music
about the grid search, gaia actually does a combination of the parameters
2020-07-13 19522, 2020
jmp_music
that's how the 1728 training processes are condluded
2020-07-13 19525, 2020
jmp_music
am I right
2020-07-13 19526, 2020
jmp_music
?
2020-07-13 19557, 2020
alastairp
correct
2020-07-13 19509, 2020
jmp_music
grid search in sklearn does a little different kind of process based on the input parameters
2020-07-13 19528, 2020
jmp_music
I ll start imitating gaia to follow the 1728 processes
2020-07-13 19533, 2020
jmp_music
for now
2020-07-13 19541, 2020
alastairp
how does sklearn do it?
2020-07-13 19501, 2020
jmp_music
I think (based on the documentation), it takes a range of the parameters that are inserted into the class method, but based on the best model that is exported, we can't see which of gaia's parameteres worked best
gaia exports the `.param` files that contain the exact parameters. it will not be a problem to me to work that way
2020-07-13 19517, 2020
alastairp
in the example with the documentation, does `clf.cv_results_` contain all of the results for all items in the grid search?
2020-07-13 19519, 2020
alastairp
or just the winner?
2020-07-13 19535, 2020
alastairp
it seems like it contains all of them, right?
2020-07-13 19534, 2020
alastairp
it seems like we could use this for the combination of C/gamma/kernel functions. can we also use it to perform the selection of the preprocessing steps, or would we have to do that ourselves?
hmm let me check about it. As I remember, It exports all the results, but now the exactly values we input as hyperparameters
2020-07-13 19508, 2020
alastairp
yes, I was just reading this comment on the link I posted: "In this answer, GridSearchCV will tune the hyperparameters on the data already preprocessed by StandardScaler, which is not correct. "
2020-07-13 19510, 2020
alastairp
is that what you mean?
2020-07-13 19519, 2020
jmp_music
I mean that if we set the gamma for example to 7, we could not know that is 7
2020-07-13 19530, 2020
alastairp
I don't know if gaia actually tunes the hyperparameters like this too
for example here - instead of 1, 10 it is 2**1, 2**10?
2020-07-13 19505, 2020
jmp_music
but grid search cv will show us the 2**7 as the best score, not the 7 we set as the hyperparameter input
2020-07-13 19521, 2020
alastairp
ah, I see
2020-07-13 19522, 2020
jmp_music
hmmm ok!
2020-07-13 19530, 2020
alastairp
I don't think that's a problem
2020-07-13 19540, 2020
alastairp
we can just take log2 to get the original value back if we want
2020-07-13 19535, 2020
jmp_music
ok! nice!
2020-07-13 19503, 2020
jmp_music
I 'll continue the refactoring and I'll keep you updated with the results
2020-07-13 19522, 2020
alastairp
I think we should try whenever possible to always use sklearn library classes/functions instead of writing it ourselves
2020-07-13 19536, 2020
alastairp
even if it means that the results might be a little bit different
2020-07-13 19532, 2020
alastairp
because I suspect that the amount of code that we have to write to do the training process is actually quite short
2020-07-13 19503, 2020
alastairp
because all of the things that we need to do (normalisation, preprocessing, grid search, evaluation, etc) all exist in the sklearn api
2020-07-13 19502, 2020
MFCR_ColbyRay joined the channel
2020-07-13 19554, 2020
jmp_music
you are right! If we do not care about some exports of the reports to be exactly the same as gaia's we could of course proceed that way.
2020-07-13 19545, 2020
jmp_music
I mean for example the std outputs to be the at same format
2020-07-13 19512, 2020
jmp_music
stdout --> terminal, not the standard deviation haha
2020-07-13 19520, 2020
alastairp
right. there's no requirement for that to be the same
2020-07-13 19541, 2020
alastairp
and even the output files don't have to be the same. as long as we can get more or less the same information out of it
2020-07-13 19514, 2020
MFCR_ColbyRay has quit
2020-07-13 19509, 2020
alastairp
the most important information is 1) accuracy, 2) confusion matrix, 3) list of parameters [c, gamma, kernel, preprocessing], 4) report showing all of the parameters and the accuracy for each one
2020-07-13 19521, 2020
alastairp
let's focus on the first 2 for now, the last 2 are less important
2020-07-13 19506, 2020
jmp_music
cool!!
2020-07-13 19522, 2020
jmp_music
thanks for the help!
2020-07-13 19518, 2020
MFCR_ColbyRay joined the channel
2020-07-13 19526, 2020
MFCR_ColbyRay has quit
2020-07-13 19540, 2020
BrainzGit
[musicbrainz-server] mwiencek opened pull request #1595 (master…find-by-artist-perf): MBS-10939, MBS-10940: Speed up Data::Release::find_by_artist for VA, allow filtering by date/country https://github.com/metabrainz/musicbrainz-server/…