#metabrainz

/

      • ruaok
        shivam-kapila: of course, that is fine. let's talk later about adjusting your schedule accordingly.
      • 2020-07-13 19543, 2020

      • Sophist-UK has quit
      • 2020-07-13 19543, 2020

      • gr0uch0mars has quit
      • 2020-07-13 19544, 2020

      • nawcom has quit
      • 2020-07-13 19507, 2020

      • Sophist-UK joined the channel
      • 2020-07-13 19507, 2020

      • gr0uch0mars joined the channel
      • 2020-07-13 19507, 2020

      • nawcom joined the channel
      • 2020-07-13 19532, 2020

      • shivam-kapila
        sure thing
      • 2020-07-13 19537, 2020

      • shivam-kapila
        thanks
      • 2020-07-13 19544, 2020

      • prabal
        CatQuest: thanks for detailed feedback
      • 2020-07-13 19559, 2020

      • CatQuest
        prabal: sorry for the wall of text 😹
      • 2020-07-13 19501, 2020

      • prabal
        I'll read it properly later. Busy with something rn. :)
      • 2020-07-13 19508, 2020

      • CatQuest
        sure don't worry!
      • 2020-07-13 19524, 2020

      • MFCR_ColbyRay joined the channel
      • 2020-07-13 19538, 2020

      • MFCR_ColbyRay
      • 2020-07-13 19548, 2020

      • CatQuest
        hi colbyray!
      • 2020-07-13 19552, 2020

      • MFCR_ColbyRay
        hi
      • 2020-07-13 19515, 2020

      • CatQuest
        is this an idea for a new welcome ot musicbrainz page?
      • 2020-07-13 19520, 2020

      • CatQuest
        for the wiki or the main site?
      • 2020-07-13 19553, 2020

      • MFCR_ColbyRay
        i was bored when i made this
      • 2020-07-13 19559, 2020

      • CatQuest
        it's very well done :D
      • 2020-07-13 19501, 2020

      • CatQuest
        cool!
      • 2020-07-13 19515, 2020

      • CatQuest
        i liek the "who uses musicrainz" section
      • 2020-07-13 19533, 2020

      • CatQuest
        damn, forgive my spellings
      • 2020-07-13 19537, 2020

      • MFCR_ColbyRay
        i simply combined the About page and MusicBrainz in a Nutshell
      • 2020-07-13 19551, 2020

      • CatQuest
        yea! it's apretty good :O
      • 2020-07-13 19532, 2020

      • MFCR_ColbyRay has quit
      • 2020-07-13 19538, 2020

      • CatQuest
        if.. oh
      • 2020-07-13 19530, 2020

      • CatQuest
        I wanted to say that if he wanted a feedback and idea I had was to in the "who uses" section, the section saying "and more" could link to the appropriate segemtns on metabrainz
      • 2020-07-13 19540, 2020

      • CatQuest
        there's on for researches too
      • 2020-07-13 19548, 2020

      • CatQuest
        well :shrug:
      • 2020-07-13 19523, 2020

      • MFCR_ColbyRay joined the channel
      • 2020-07-13 19531, 2020

      • CatQuest
        there you are again!
      • 2020-07-13 19536, 2020

      • CatQuest
        hi did you read my idea?
      • 2020-07-13 19553, 2020

      • MFCR_ColbyRay
        yes
      • 2020-07-13 19501, 2020

      • MFCR_ColbyRay
      • 2020-07-13 19536, 2020

      • CatQuest
        hmm but i think it's ok to lin kto metabrainz
      • 2020-07-13 19557, 2020

      • CatQuest
        since this page is essentially a "promotion of mb"
      • 2020-07-13 19503, 2020

      • CatQuest
        which is ok, i think?
      • 2020-07-13 19519, 2020

      • CatQuest
        freso, reosarevok, rdswift ?
      • 2020-07-13 19535, 2020

      • MFCR_ColbyRay has quit
      • 2020-07-13 19546, 2020

      • CatQuest
        eugh. maybe this is one of those things where i don't know shit :(
      • 2020-07-13 19554, 2020

      • CatQuest
        🤷
      • 2020-07-13 19538, 2020

      • sumedh joined the channel
      • 2020-07-13 19559, 2020

      • jmp_music
        hey alastairp. I have figured out what was going on with the data that were loading from ground truth file. I fixed that issue. I started also following gaia's structure of development. I have some questions to do when you have time.
      • 2020-07-13 19505, 2020

      • alastairp
        hi jmp_music
      • 2020-07-13 19517, 2020

      • alastairp
        sorry, I completely missed responding to you about Friday
      • 2020-07-13 19525, 2020

      • jmp_music
        no worries!
      • 2020-07-13 19538, 2020

      • alastairp
        if that happens again, just mention me again - I was around, but had just forgotten
      • 2020-07-13 19545, 2020

      • alastairp
        what are your questions?
      • 2020-07-13 19555, 2020

      • alastairp
        I saw your equal accuracies, that's great!
      • 2020-07-13 19516, 2020

      • jmp_music
        no worries, I understand you are too busy with various things
      • 2020-07-13 19546, 2020

      • jmp_music
        Well some times the accuracies were equal to gaia but some other times were not
      • 2020-07-13 19505, 2020

      • alastairp
        that could very well be because of some unexpected randomness
      • 2020-07-13 19512, 2020

      • alastairp
        how many different datasets have you tried?
      • 2020-07-13 19512, 2020

      • jmp_music
        I figured out that the problem was in random
      • 2020-07-13 19519, 2020

      • jmp_music
        exactly!
      • 2020-07-13 19552, 2020

      • jmp_music
        When I loaded random.shuffle with a numpy array, this method duplicated the data
      • 2020-07-13 19520, 2020

      • jmp_music
        and some other they were disappearing from the dataset
      • 2020-07-13 19549, 2020

      • jmp_music
        but when I loaded the data from the gt as a simple list, the suffling was done successfully
      • 2020-07-13 19529, 2020

      • alastairp
        that sounds like a good thing to write a comment about, to make sure it doesn't happen again
      • 2020-07-13 19551, 2020

      • jmp_music
        of course. I 'll include it in my comments
      • 2020-07-13 19558, 2020

      • jmp_music
        inside the code
      • 2020-07-13 19517, 2020

      • jmp_music
        I dont know why the random library had that issue. I started re-factoring some things in my code, and I decided to follow gaia's structure
      • 2020-07-13 19553, 2020

      • alastairp
        what structure? The code, or the format of the data files?
      • 2020-07-13 19547, 2020

      • jmp_music
        something like a hybrid solution of these two. I have to borrow some parts of the structure (not the code itself), just the way the methods are called
      • 2020-07-13 19502, 2020

      • alastairp
        ok, cool
      • 2020-07-13 19509, 2020

      • alastairp
        how many datasets have you evaluated?
      • 2020-07-13 19533, 2020

      • jmp_music
        3 for now, but since yesterday I started the refactoring.
      • 2020-07-13 19542, 2020

      • jmp_music
        because of the issue of random library
      • 2020-07-13 19501, 2020

      • jmp_music
        another thing that I discovered is the grid search
      • 2020-07-13 19504, 2020

      • alastairp
        great, and after your refactoring, are all of the accuracies approximately the same?
      • 2020-07-13 19519, 2020

      • jmp_music
        I will hope so
      • 2020-07-13 19526, 2020

      • jmp_music
        I havent finished yet
      • 2020-07-13 19549, 2020

      • jmp_music
        about the grid search, gaia actually does a combination of the parameters
      • 2020-07-13 19522, 2020

      • jmp_music
        that's how the 1728 training processes are condluded
      • 2020-07-13 19525, 2020

      • jmp_music
        am I right
      • 2020-07-13 19526, 2020

      • jmp_music
        ?
      • 2020-07-13 19557, 2020

      • alastairp
        correct
      • 2020-07-13 19509, 2020

      • jmp_music
        grid search in sklearn does a little different kind of process based on the input parameters
      • 2020-07-13 19528, 2020

      • jmp_music
        I ll start imitating gaia to follow the 1728 processes
      • 2020-07-13 19533, 2020

      • jmp_music
        for now
      • 2020-07-13 19541, 2020

      • alastairp
        how does sklearn do it?
      • 2020-07-13 19501, 2020

      • jmp_music
        I think (based on the documentation), it takes a range of the parameters that are inserted into the class method, but based on the best model that is exported, we can't see which of gaia's parameteres worked best
      • 2020-07-13 19518, 2020

      • alastairp
        what do you mean class method?
      • 2020-07-13 19524, 2020

      • alastairp
        can you give me a link to the documentation?
      • 2020-07-13 19544, 2020

      • jmp_music
        sorry `method` I wanted to write
      • 2020-07-13 19554, 2020

      • jmp_music
      • 2020-07-13 19518, 2020

      • jmp_music
        gaia exports the `.param` files that contain the exact parameters. it will not be a problem to me to work that way
      • 2020-07-13 19517, 2020

      • alastairp
        in the example with the documentation, does `clf.cv_results_` contain all of the results for all items in the grid search?
      • 2020-07-13 19519, 2020

      • alastairp
        or just the winner?
      • 2020-07-13 19535, 2020

      • alastairp
        it seems like it contains all of them, right?
      • 2020-07-13 19534, 2020

      • alastairp
        it seems like we could use this for the combination of C/gamma/kernel functions. can we also use it to perform the selection of the preprocessing steps, or would we have to do that ourselves?
      • 2020-07-13 19516, 2020

      • alastairp
        I'm just looking at the documentation for pipelines: https://stackoverflow.com/questions/43366561/use-…
      • 2020-07-13 19542, 2020

      • jmp_music
        hmm let me check about it. As I remember, It exports all the results, but now the exactly values we input as hyperparameters
      • 2020-07-13 19508, 2020

      • alastairp
        yes, I was just reading this comment on the link I posted: "In this answer, GridSearchCV will tune the hyperparameters on the data already preprocessed by StandardScaler, which is not correct. "
      • 2020-07-13 19510, 2020

      • alastairp
        is that what you mean?
      • 2020-07-13 19519, 2020

      • jmp_music
        I mean that if we set the gamma for example to 7, we could not know that is 7
      • 2020-07-13 19530, 2020

      • alastairp
        I don't know if gaia actually tunes the hyperparameters like this too
      • 2020-07-13 19548, 2020

      • jmp_music
        it tunes it by 2**x
      • 2020-07-13 19505, 2020

      • jmp_music
        where X is the gamma or C accordingly
      • 2020-07-13 19552, 2020

      • alastairp
        >>> parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
      • 2020-07-13 19553, 2020

      • alastairp
        >>> clf = GridSearchCV(svc, parameters)
      • 2020-07-13 19502, 2020

      • alastairp
        for example here - instead of 1, 10 it is 2**1, 2**10?
      • 2020-07-13 19505, 2020

      • jmp_music
        but grid search cv will show us the 2**7 as the best score, not the 7 we set as the hyperparameter input
      • 2020-07-13 19521, 2020

      • alastairp
        ah, I see
      • 2020-07-13 19522, 2020

      • jmp_music
        hmmm ok!
      • 2020-07-13 19530, 2020

      • alastairp
        I don't think that's a problem
      • 2020-07-13 19540, 2020

      • alastairp
        we can just take log2 to get the original value back if we want
      • 2020-07-13 19535, 2020

      • jmp_music
        ok! nice!
      • 2020-07-13 19503, 2020

      • jmp_music
        I 'll continue the refactoring and I'll keep you updated with the results
      • 2020-07-13 19522, 2020

      • alastairp
        I think we should try whenever possible to always use sklearn library classes/functions instead of writing it ourselves
      • 2020-07-13 19536, 2020

      • alastairp
        even if it means that the results might be a little bit different
      • 2020-07-13 19532, 2020

      • alastairp
        because I suspect that the amount of code that we have to write to do the training process is actually quite short
      • 2020-07-13 19503, 2020

      • alastairp
        because all of the things that we need to do (normalisation, preprocessing, grid search, evaluation, etc) all exist in the sklearn api
      • 2020-07-13 19502, 2020

      • MFCR_ColbyRay joined the channel
      • 2020-07-13 19554, 2020

      • jmp_music
        you are right! If we do not care about some exports of the reports to be exactly the same as gaia's we could of course proceed that way.
      • 2020-07-13 19545, 2020

      • jmp_music
        I mean for example the std outputs to be the at same format
      • 2020-07-13 19512, 2020

      • jmp_music
        stdout --> terminal, not the standard deviation haha
      • 2020-07-13 19520, 2020

      • alastairp
        right. there's no requirement for that to be the same
      • 2020-07-13 19541, 2020

      • alastairp
        and even the output files don't have to be the same. as long as we can get more or less the same information out of it
      • 2020-07-13 19514, 2020

      • MFCR_ColbyRay has quit
      • 2020-07-13 19509, 2020

      • alastairp
        the most important information is 1) accuracy, 2) confusion matrix, 3) list of parameters [c, gamma, kernel, preprocessing], 4) report showing all of the parameters and the accuracy for each one
      • 2020-07-13 19521, 2020

      • alastairp
        let's focus on the first 2 for now, the last 2 are less important
      • 2020-07-13 19506, 2020

      • jmp_music
        cool!!
      • 2020-07-13 19522, 2020

      • jmp_music
        thanks for the help!
      • 2020-07-13 19518, 2020

      • MFCR_ColbyRay joined the channel
      • 2020-07-13 19526, 2020

      • MFCR_ColbyRay has quit
      • 2020-07-13 19540, 2020

      • BrainzGit
        [musicbrainz-server] mwiencek opened pull request #1595 (master…find-by-artist-perf): MBS-10939, MBS-10940: Speed up Data::Release::find_by_artist for VA, allow filtering by date/country https://github.com/metabrainz/musicbrainz-server/…
      • 2020-07-13 19542, 2020

      • BrainzBot
        MBS-10939: Data::Release::find_by_artist should be improved to not cause database load issues in production https://tickets.metabrainz.org/browse/MBS-10939
      • 2020-07-13 19542, 2020

      • BrainzBot
        MBS-10940: Allow filtering the artist Releases tab by date and country https://tickets.metabrainz.org/browse/MBS-10940
      • 2020-07-13 19553, 2020

      • iliekcomputers
        ishaanshah: hey
      • 2020-07-13 19509, 2020

      • ishaanshah
        iliekcomputers: Hi!
      • 2020-07-13 19514, 2020

      • iliekcomputers
        how are you?
      • 2020-07-13 19531, 2020

      • ishaanshah
        Doing well
      • 2020-07-13 19537, 2020

      • ishaanshah
        How about you?
      • 2020-07-13 19541, 2020

      • iliekcomputers
        great!
      • 2020-07-13 19544, 2020

      • iliekcomputers
        i saw your PR
      • 2020-07-13 19548, 2020

      • iliekcomputers
        for the import.
      • 2020-07-13 19500, 2020

      • iliekcomputers
        I'm not sure if extracting the entire dump is a good idea.
      • 2020-07-13 19507, 2020

      • iliekcomputers
        it would take up too much space.
      • 2020-07-13 19543, 2020

      • iliekcomputers
        and it doesn't solve the problem of an exception during the hdfs upload itself.
      • 2020-07-13 19553, 2020

      • iliekcomputers
        here's what I was thinking.
      • 2020-07-13 19557, 2020

      • ishaanshah
        Hmm, so do you suggest we copy to a temp dir in hdfs
      • 2020-07-13 19516, 2020

      • ishaanshah
        and then move to the dest_path?
      • 2020-07-13 19517, 2020

      • iliekcomputers
        create a "temp" folder in hdfs, import the data in there, if all goes well, copy that to the main listenbrainz dir
      • 2020-07-13 19521, 2020

      • iliekcomputers
        yes
      • 2020-07-13 19533, 2020

      • iliekcomputers
        if there's an exception, log it and exit
      • 2020-07-13 19553, 2020

      • ishaanshah
        Hmm, that will work too
      • 2020-07-13 19506, 2020

      • iliekcomputers
        great!