#metabrainz

/

      • ruaok
        jmp_music: send me a private message with your email and I'll invite you.
      • 2020-05-08 12940, 2020

      • alastairp
        a great first step would be to train a model with gaia to ensure that you know how the process works, and then to reproduce that process in scikit learn
      • 2020-05-08 12927, 2020

      • jmp_music
        alastairp: Ok thanks! I downloaded also the datasets via the link you provided to me a few days ago.
      • 2020-05-08 12956, 2020

      • alastairp
        in gaia we do a grid search with about 700 C/gamma values and feature permutations. There's a configuration file which lists these parameters (https://github.com/MTG/gaia/blob/master/src/bindi…) It would be good to have something similar in scikit learn, but it doesn't have to use this configuration file
      • 2020-05-08 12926, 2020

      • alastairp
        I understand that sklearn has a number of helper tools for grid search, so it seems like it would be a good idea to use that as much as possible
      • 2020-05-08 12921, 2020

      • jmp_music
        I could test the sklearn's GridSearch embedded algorithm as well as the RandomizedSearchCV too
      • 2020-05-08 12934, 2020

      • jmp_music
        to see which one could provide better results
      • 2020-05-08 12921, 2020

      • alastairp
        yeah, those were the ones that I was thinking of
      • 2020-05-08 12950, 2020

      • jmp_music
        I agree with you to start by training a similar model to gaia and compare its results. As I saw from the datasets you provided to me, the problem is a multilabel classification and now a multiclass one
      • 2020-05-08 12954, 2020

      • jmp_music
        not*
      • 2020-05-08 12939, 2020

      • jmp_music
        Each row of the dataset inludes its MBID, and some genres labeled to the track
      • 2020-05-08 12950, 2020

      • jmp_music
        the genres are from 1 to 30
      • 2020-05-08 12908, 2020

      • alastairp
        for those datasets, yes - but this isn't our only classification task. The main reason that I sent you those datasets so that you could download them and have a local copy of a few thousand .json files
      • 2020-05-08 12918, 2020

      • alastairp
        in case you need more data
      • 2020-05-08 12936, 2020

      • alastairp
        here are our current datasets: https://acousticbrainz.org/datasets/accuracy
      • 2020-05-08 12929, 2020

      • alastairp
        these are much more "traditional" single-label datasets
      • 2020-05-08 12939, 2020

      • alastairp
        these are the ones that we should focus on first
      • 2020-05-08 12912, 2020

      • jmp_music
        Ok! By this link I understand that it refers to a multiclass problem
      • 2020-05-08 12959, 2020

      • jmp_music
        The outcomes are from the SVM's decision function probabilities. Am I right?
      • 2020-05-08 12904, 2020

      • alastairp
        yes
      • 2020-05-08 12934, 2020

      • alastairp
        see the output of a high-level model: https://acousticbrainz.org/4792f85c-ba03-43db-af4…
      • 2020-05-08 12901, 2020

      • alastairp
        in acousticbrainz we use "low-level" to mean features - these are extracted from audio files
      • 2020-05-08 12913, 2020

      • alastairp
        and "high-level" means the results of an ML model
      • 2020-05-08 12935, 2020

      • jmp_music
        I have figured out how this works. However, there are some questions that I would like to ask you.
      • 2020-05-08 12914, 2020

      • jmp_music
        I checked the low-level data, and I saw that some of the features are lists (arrays)
      • 2020-05-08 12919, 2020

      • jmp_music
        these lists have a standard length of values, except the "rhythm_beats_position"
      • 2020-05-08 12942, 2020

      • alastairp
        right, because songs are different lengths 😅
      • 2020-05-08 12956, 2020

      • jmp_music
        ahahaha
      • 2020-05-08 12944, 2020

      • jmp_music
        that's right. And my question is, if there is a post-process that exports a feature from these values
      • 2020-05-08 12955, 2020

      • jmp_music
        e.g. taking the mean, etc.
      • 2020-05-08 12926, 2020

      • jmp_music
        or the length of the list as a value (with the python's len() method)
      • 2020-05-08 12954, 2020

      • jmp_music
        I shoould start checking gaia for this process if it takes place there
      • 2020-05-08 12907, 2020

      • alastairp
        for that specific value, I don't think so
      • 2020-05-08 12922, 2020

      • alastairp
        yes, good idea. I was just looking at https://github.com/MTG/gaia/blob/master/src/bindi…
      • 2020-05-08 12945, 2020

      • alastairp
        and I don't see any specific behaviour for rhythm_beats_position
      • 2020-05-08 12918, 2020

      • alastairp
        that's a good question though, I wonder if we should remove this from the data before building the model... it seems like it has the potential to introduce bad training data
      • 2020-05-08 12903, 2020

      • jmp_music
        Let me check about it and the other features too. Maybe some of them could be dropped before the training process and thus speed up the training time
      • 2020-05-08 12916, 2020

      • alastairp
        good idea, but let's focus on that after we've reproduced the existing models
      • 2020-05-08 12937, 2020

      • jmp_music
        yes of course.
      • 2020-05-08 12941, 2020

      • jmp_music
        for the training process and during the labeling of the data all these classes should be included?
      • 2020-05-08 12942, 2020

      • jmp_music
        ['danceability', 'gender', 'genre_dortmund', 'genre_electronic', 'genre_rosamerica', 'genre_tzanetakis', 'ismir04_rhythm', 'mood_acoustic', 'mood_aggressive', 'mood_electronic', 'mood_happy', 'mood_party', 'mood_relaxed', 'mood_sad', 'moods_mirex', 'timbre', 'tonal_atonal', 'voice_instrumental']
      • 2020-05-08 12951, 2020

      • alastairp
        those are all individual models
      • 2020-05-08 12926, 2020

      • alastairp
        but yes, we should build new models for all of these datasets
      • 2020-05-08 12912, 2020

      • yvanzo
        bitmap, reosarevok: Is v-2020-05-18 supposed to be pg12 only? If so, should we freeze master until then?
      • 2020-05-08 12939, 2020

      • reosarevok
        Most our schema changes have included unrelated code too
      • 2020-05-08 12957, 2020

      • reosarevok
        So I'd expect no, but if bitmap thinks it's important to make it PG12 only, then we can
      • 2020-05-08 12955, 2020

      • jmp_music
        alastairp: I 'll be waiting for the dataset link and I'll come up with updates.
      • 2020-05-08 12911, 2020

      • jmp_music
        Is the dataset already labeled?
      • 2020-05-08 12915, 2020

      • alastairp
        yes
      • 2020-05-08 12927, 2020

      • jmp_music
        Alastair, thank you very much for your introduction to the project and its needs
      • 2020-05-08 12900, 2020

      • alastairp
        no problem. we're looking forward to your work
      • 2020-05-08 12952, 2020

      • BrainzGit
        [musicbrainz-server] reosarevok opened pull request #1503 (master…MBS-9340): MBS-9340: Only allow mul and zxx as the only work language https://github.com/metabrainz/musicbrainz-server/…
      • 2020-05-08 12953, 2020

      • BrainzBot
        MBS-9340: Don't allow more languages if [No lyrics] is selected https://tickets.metabrainz.org/browse/MBS-9340
      • 2020-05-08 12906, 2020

      • yvanzo
        reosarevok: sure, it is just we won't have two weeks as usual.
      • 2020-05-08 12916, 2020

      • reosarevok
        Oh, I see
      • 2020-05-08 12932, 2020

      • reosarevok
        Well, then we could do it so that we only merge bugfixes or something?
      • 2020-05-08 12934, 2020

      • reosarevok
        Dunno
      • 2020-05-08 12934, 2020

      • yvanzo
        We could also merge pg12 instead of master into beta/production.
      • 2020-05-08 12959, 2020

      • yvanzo
        Since tags are on production, that would not require to freeze master at all.
      • 2020-05-08 12947, 2020

      • Cyna[m]
        reosarevok:
      • 2020-05-08 12957, 2020

      • Cyna[m]
        made changes and pushed :)
      • 2020-05-08 12957, 2020

      • reosarevok
        I saw, I'm going to test
      • 2020-05-08 12940, 2020

      • jmp_music has quit
      • 2020-05-08 12951, 2020

      • ishaanshah[m]
        iliekcomputers: Hi, can we do our meeting a bit earlier today?
      • 2020-05-08 12913, 2020

      • iliekcomputers
        sure
      • 2020-05-08 12930, 2020

      • iliekcomputers
        i haven't been able to look at the PR again
      • 2020-05-08 12941, 2020

      • ishaanshah[m]
        I added another parameter to the api endpoint today, "offset"
      • 2020-05-08 12941, 2020

      • iliekcomputers
        i'll try to do that tomorrow
      • 2020-05-08 12900, 2020

      • ishaanshah[m]
        I figured we would need for pagination
      • 2020-05-08 12917, 2020

      • iliekcomputers
        true, that makes sense
      • 2020-05-08 12920, 2020

      • ishaanshah[m]
        Zastai proposed that we should use timestamp instead of all_time...
      • 2020-05-08 12932, 2020

      • iliekcomputers
        right, i saw that
      • 2020-05-08 12942, 2020

      • iliekcomputers
        i think the proposal makes sense
      • 2020-05-08 12955, 2020

      • ishaanshah[m]
        Although, I am not sure we can do that because we calculate stats in batch
      • 2020-05-08 12907, 2020

      • iliekcomputers
        eventually however, right now it isn't really feasible
      • 2020-05-08 12941, 2020

      • ishaanshah[m]
        Yes, maybe later we can make spark work for on demand queries
      • 2020-05-08 12955, 2020

      • ishaanshah[m]
        I will open a ticket for that then
      • 2020-05-08 12958, 2020

      • iliekcomputers
        i'm happy to open a ticket and think more about it
      • 2020-05-08 12905, 2020

      • iliekcomputers
        sounds good
      • 2020-05-08 12909, 2020

      • iliekcomputers
        one small thing
      • 2020-05-08 12920, 2020

      • iliekcomputers
        the endpoint is `artist` rn
      • 2020-05-08 12928, 2020

      • iliekcomputers
        `artists`
      • 2020-05-08 12939, 2020

      • iliekcomputers
        would be better
      • 2020-05-08 12903, 2020

      • ishaanshah[m]
        Ya sure I will change that
      • 2020-05-08 12940, 2020

      • ishaanshah[m]
        Other than that I am done with the rendering and processing part for artist grapj
      • 2020-05-08 12905, 2020

      • ishaanshah[m]
        I have fixed LB-570 too
      • 2020-05-08 12905, 2020

      • BrainzBot
        LB-570: Artist graph: long artist names should wrap https://tickets.metabrainz.org/browse/LB-570
      • 2020-05-08 12914, 2020

      • iliekcomputers
        oh awesome
      • 2020-05-08 12940, 2020

      • Mr_Monkey
        👍
      • 2020-05-08 12944, 2020

      • iliekcomputers
        i think getting a review of the design from Mr_Monkey would be helpful, if he gets the time
      • 2020-05-08 12947, 2020

      • ishaanshah[m]
        I figured I will open another PR for LB-547
      • 2020-05-08 12948, 2020

      • BrainzBot
      • 2020-05-08 12919, 2020

      • ishaanshah[m]
        Because the current PR has already become large
      • 2020-05-08 12927, 2020

      • Mr_Monkey
        I'm going to be jumping back into LB more in the coming months, so I'll definitely be able to find some time, looking at it a lot :)
      • 2020-05-08 12901, 2020

      • ishaanshah[m]
        Mr_Monkey thanks I will post a screenshot
      • 2020-05-08 12958, 2020

      • ishaanshah[m]
        iliekcomputers The only part remaining in the the graph PR is fetching the stats from the backend
      • 2020-05-08 12915, 2020

      • ishaanshah[m]
        I will do that when the endpoint PR is merged
      • 2020-05-08 12934, 2020

      • iliekcomputers
        ok. i'll try to merge over the weekend
      • 2020-05-08 12903, 2020

      • ishaanshah[m]
        Cool, thanks a lot :)
      • 2020-05-08 12908, 2020

      • sumedh has quit
      • 2020-05-08 12958, 2020

      • Zastai has quit
      • 2020-05-08 12903, 2020

      • sumedh joined the channel
      • 2020-05-08 12904, 2020

      • ishaanshah[m]
        Also a small bug
      • 2020-05-08 12949, 2020

      • ishaanshah[m]
      • 2020-05-08 12955, 2020

      • ishaanshah[m]
        This page shows 404
      • 2020-05-08 12911, 2020

      • shivam-kapila
        Yeah that /
      • 2020-05-08 12917, 2020

      • ishaanshah[m]
        Because of the extra slash
      • 2020-05-08 12951, 2020

      • iliekcomputers
        open a ticket
      • 2020-05-08 12923, 2020

      • ishaanshah[m]
        Ya, sure
      • 2020-05-08 12931, 2020

      • CatQuest joined the channel
      • 2020-05-08 12931, 2020

      • CatQuest has quit
      • 2020-05-08 12931, 2020

      • CatQuest joined the channel
      • 2020-05-08 12916, 2020

      • reosarevok
        Cyna[m]: there's some issues with those historical edits, but I can work on that, since I've been dealing with historical edits anyway
      • 2020-05-08 12945, 2020

      • Cyna[m]
        Once the two open PRs are merged... I continue with next entity
      • 2020-05-08 12919, 2020

      • Freso
        ruaok: Looking at https://test.listenbrainz.org/user/Freso vs. https://listenbrainz.org/user/Freso they both report the same Listen count, but the most recent listens are not the same. Is the listen count read from the same source for both sites?
      • 2020-05-08 12900, 2020

      • shivam-kapila
        Freso: IIRC test.lb.org uses same redis as prod.
      • 2020-05-08 12916, 2020

      • Freso
        shivam-kapila: Right.
      • 2020-05-08 12957, 2020

      • sumedh has quit
      • 2020-05-08 12918, 2020

      • BrainzGit
        [bookbrainz-site] prabalsingh24 opened pull request #423 (master…add-user-in-search): search: add 'editor' type in the search result https://github.com/bookbrainz/bookbrainz-site/pul…
      • 2020-05-08 12951, 2020

      • supersandro20005 has quit
      • 2020-05-08 12904, 2020

      • supersandro2000 joined the channel
      • 2020-05-08 12918, 2020

      • lazka joined the channel
      • 2020-05-08 12925, 2020

      • lazka
        Does anyone know who manages the MB Ubuntu PPA?
      • 2020-05-08 12958, 2020

      • supersandro2000
      • 2020-05-08 12915, 2020

      • reosarevok
        zas: that you? ^
      • 2020-05-08 12930, 2020

      • reosarevok
        outsidecontext: ? ^ :)
      • 2020-05-08 12955, 2020

      • reosarevok
        (assuming you mean the Picard one)
      • 2020-05-08 12911, 2020

      • lazka
      • 2020-05-08 12908, 2020

      • reosarevok
        Yeah, I suspect zas and outsidecontext are the most likely to be involved
      • 2020-05-08 12935, 2020

      • lazka
        ah, there is a history with user names for builds, so phillipp wolfer I guess
      • 2020-05-08 12916, 2020

      • reosarevok
        That'd be outsidecontext then
      • 2020-05-08 12920, 2020

      • reosarevok
        What did you need? :)
      • 2020-05-08 12900, 2020

      • lazka
        outsidecontext, you copied some of my mutagen packages into the PPA but because I moved the mutagen tools from the python2 to python3 package you also need to copy the python2-mutagen variants
      • 2020-05-08 12931, 2020

      • lazka
        (some user emailed me about it)
      • 2020-05-08 12959, 2020

      • lazka
        I should have added a version conflict I guess, but didn't think of the copying to other PPAs case
      • 2020-05-08 12932, 2020

      • adhawkins has quit
      • 2020-05-08 12946, 2020

      • adhawkins joined the channel
      • 2020-05-08 12946, 2020

      • reosarevok
        I'll send him an email about it too, in case he misses this :)
      • 2020-05-08 12910, 2020

      • BrainzGit
        [musicbrainz-docker] yvanzo merged pull request #145 (mbvm-38-dev…recv-keys): Try reaching different PGP servers/pools if needed https://github.com/metabrainz/musicbrainz-docker/…
      • 2020-05-08 12921, 2020

      • outsidecontext
        lazka: thanks for the info, I'll look into it.
      • 2020-05-08 12949, 2020

      • outsidecontext
        lazka: do I get this right: the issue is a file conflict, if one has the python2 package installed?
      • 2020-05-08 12931, 2020

      • lazka
        outsidecontext, if you install the py3 one it tries to install tools owned by the py2 package. To work around this I added a py2 variant which doesn't include the tools
      • 2020-05-08 12916, 2020

      • lazka
        so, yes
      • 2020-05-08 12920, 2020

      • shivam-kapila
        ruaok: Even though the query is heavy but we tend to do too much in /user/<user_name> route.
      • 2020-05-08 12942, 2020

      • shivam-kapila
        We even call this heavy query twice. And the second time its totally unbound
      • 2020-05-08 12956, 2020

      • shivam-kapila
      • 2020-05-08 12940, 2020

      • ruaok
        yeah, we need to rethink that. :)
      • 2020-05-08 12900, 2020

      • shivam-kapila
        We also fetch min/max timestamps for the user
      • 2020-05-08 12921, 2020

      • shivam-kapila
        Wont the latest_listen_ts be mostly equal to max_ts
      • 2020-05-08 12937, 2020

      • shivam-kapila
        If its so then we can totally nuke out this query