#metabrainz

/

      • mayhem
        but, given the concept of matrix factorization and how it is applied to collaborative filtering, we could apply those concepts here.
      • 2022-11-01 30529, 2022

      • mayhem
        I would start with a subset of the MB data -- how to define that is a challenge in itself, but something other than the full set will make life easier.
      • 2022-11-01 30527, 2022

      • mayhem
        then massage the genre/tag data that musicbrainz has in order to express it as matrix factorization problem.
      • 2022-11-01 30517, 2022

      • mayhem
        these tracks by these artists have these genres. these tracks by these (same) artists have no genre data. can we matrix factor those and guess at what the missing genres are?
      • 2022-11-01 30558, 2022

      • mayhem
        does that problem statement make sense to you, galactic0205?
      • 2022-11-01 30511, 2022

      • galactic0205
        yes it does!
      • 2022-11-01 30542, 2022

      • mayhem
        is that of interest to you?
      • 2022-11-01 30512, 2022

      • mayhem
        do you know apache spark?
      • 2022-11-01 30536, 2022

      • galactic0205
        yes, I would definitely give it a try.
      • 2022-11-01 30537, 2022

      • galactic0205
        I have a brief knowledge of how matrix factorization techniques work.
      • 2022-11-01 30537, 2022

      • galactic0205
        I can look more into the same with the specific problem statement.
      • 2022-11-01 30517, 2022

      • mayhem
        well, the good news is that Apache Spark implements it and that we use it for collaborative filtering for recommendations. all that stuff is setup and working in production.
      • 2022-11-01 30553, 2022

      • mayhem
        the top level source tree for all the machine learning stuff is pretty much here: https://github.com/metabrainz/listenbrainz-server…
      • 2022-11-01 30532, 2022

      • mayhem
        in particular the CF for recordings code is here: https://github.com/metabrainz/listenbrainz-server…
      • 2022-11-01 30553, 2022

      • galactic0205
        I am not that fluent with apache spark but I can spend a few days (2-3 days) learning about it
      • 2022-11-01 30522, 2022

      • mayhem
        yes, start with reading the code for our CF recordings -- the last link above.
      • 2022-11-01 30548, 2022

      • galactic0205
        alright, will do so !
      • 2022-11-01 30509, 2022

      • mayhem
        our main machine learning person is lucifer and myself. though we are not "machine learning" experts. We prefer to kick ass with SQL. :)
      • 2022-11-01 30525, 2022

      • mayhem
        good luck and ping us if you have questions.
      • 2022-11-01 30505, 2022

      • galactic0205
        alright
      • 2022-11-01 30505, 2022

      • galactic0205
        thank you so much
      • 2022-11-01 30506, 2022

      • galactic0205
        will get back to you guys.
      • 2022-11-01 30513, 2022

      • mayhem
        np. we'll be here.
      • 2022-11-01 30521, 2022

      • Pratha-Fish
        alastairp: Just stepped into college and they just announced that the exams that were to be held from today are postponed :moyai:
      • 2022-11-01 30521, 2022

      • Pratha-Fish
        Soo let's get the dataset published ig
      • 2022-11-01 30532, 2022

      • Pratha-Fish
        alastairp: I've also received my GSoC final evaluation as well! Thanks for the in-depth review.
      • 2022-11-01 30532, 2022

      • Pratha-Fish
        I'll also make sure to not make the same mistakes again :)
      • 2022-11-01 30553, 2022

      • mayhem
        Pratha-Fish: not sure if alastairp will be around today. its a holiday here in catalonia.
      • 2022-11-01 30532, 2022

      • Pratha-Fish
        mayhem: Ooh thanks for informing. I'll catch up with him tomorrow 👍
      • 2022-11-01 30536, 2022

      • saturday79 joined the channel
      • 2022-11-01 30538, 2022

      • saturday7 has quit
      • 2022-11-01 30538, 2022

      • saturday79 is now known as saturday7
      • 2022-11-01 30513, 2022

      • mayhem
      • 2022-11-01 30520, 2022

      • mayhem
        see track 3. lol.
      • 2022-11-01 30526, 2022

      • lucifer
        hehe lol
      • 2022-11-01 30520, 2022

      • galactic0205 has quit
      • 2022-11-01 30533, 2022

      • yvanzo
        lucifer: ~1h each to reindex URLs and RGs
      • 2022-11-01 30551, 2022

      • yvanzo
        lucifer: the same regression happened with 'release' core too.
      • 2022-11-01 30524, 2022

      • BrainzGit
        [sir] 14amCap1712 opened pull request #146 (03master…url-perf): Fix performance regression in indexing url https://github.com/metabrainz/sir/pull/146
      • 2022-11-01 30549, 2022

      • BrainzGit
        [sir] 14amCap1712 opened pull request #147 (03url-perf…rg-perg): Eagerly load artist_alias.gid in release group indexing https://github.com/metabrainz/sir/pull/147
      • 2022-11-01 30502, 2022

      • lucifer
        yvanzo: had tested this earlier today on wolf and both completed in 25-35 mins iirc.
      • 2022-11-01 30515, 2022

      • lucifer
        *25-35 mins each
      • 2022-11-01 30548, 2022

      • yvanzo
        👍 it depends on the setup too
      • 2022-11-01 30535, 2022

      • lucifer
        mayhem: should user max contribution be configurable?
      • 2022-11-01 30509, 2022

      • mayhem
        yes. plz.
      • 2022-11-01 30514, 2022

      • lucifer
        👍
      • 2022-11-01 30523, 2022

      • mayhem
        lucifer: do you know of a place in our docs where we refer to a JSON file as an example of what data should be POSTed to an endpoint?
      • 2022-11-01 30552, 2022

      • mayhem
        e.g. I have listenbrainz/art/misc/sample_cover_art_grid_post_request.json that I would like to reference in the docstring for an endpoint. if you have an example I could copy from, that would be great.
      • 2022-11-01 30521, 2022

      • lucifer
      • 2022-11-01 30535, 2022

      • mayhem
        thanks!!
      • 2022-11-01 30528, 2022

      • galactic0205 joined the channel
      • 2022-11-01 30502, 2022

      • galactic0205 has quit
      • 2022-11-01 30540, 2022

      • lucifer
        another music service claiming 100 M songs in catalog, https://twitter.com/ajassy/status/158744967427719…
      • 2022-11-01 30557, 2022

      • mayhem
        wow, they have music now!
      • 2022-11-01 30509, 2022

      • mayhem
        I guess we have even more work to do now. :)
      • 2022-11-01 30535, 2022

      • mayhem
        lucifer: are there known problems installing the documentation libs for lb-server?
      • 2022-11-01 30513, 2022

      • lucifer
        amazon music doesn't have an api afaik know though.
      • 2022-11-01 30520, 2022

      • lucifer
        nope, what error do you get?
      • 2022-11-01 30551, 2022

      • mayhem
      • 2022-11-01 30546, 2022

      • lucifer
        mayhem: uswgi is not needed for docs so not sure what's going on.
      • 2022-11-01 30511, 2022

      • lucifer
        also, added user contribution stuff? what dataset params do you want?
      • 2022-11-01 30538, 2022

      • mayhem
        you mean new datasets parameters?
      • 2022-11-01 30528, 2022

      • lucifer
        yes
      • 2022-11-01 30515, 2022

      • mayhem
        days: 1095, 730, 365. all session 300, threshold 5, count 200, user_contrib 3
      • 2022-11-01 30537, 2022

      • mayhem
        lucifer: for creating tests for the art PR, I would like to mock this function: https://github.com/metabrainz/listenbrainz-server…
      • 2022-11-01 30558, 2022

      • mayhem
        but I dont know how to do that. do you know of an example in the code?
      • 2022-11-01 30505, 2022

      • mayhem tries what should be ovious from the docs, but won't hold his breath
      • 2022-11-01 30557, 2022

      • lucifer
        mayhem: don't think we have it in LB but try https://stackoverflow.com/a/34534635
      • 2022-11-01 30513, 2022

      • mayhem
        thx
      • 2022-11-01 30510, 2022

      • lucifer
      • 2022-11-01 30530, 2022

      • lucifer
        cs.github.com is nice to use if you have access.
      • 2022-11-01 30552, 2022

      • lucifer
        if it makes free out of preview, we could retire own livegrep instance
      • 2022-11-01 30529, 2022

      • mayhem
        ah, that looks perfect!
      • 2022-11-01 30525, 2022

      • mayhem
        lucifer: getting back to the docs building, the docs requirements.txt contains `-r ../requirements.txt` which is why it picks up uwsgi
      • 2022-11-01 30545, 2022

      • CatQuest
        [13:18] <mayhem> MusicBrainz has a few million genres.
      • 2022-11-01 30545, 2022

      • CatQuest
        wow I knew reo was doing a lot of work adding genres, but several million? sounds crazy!
      • 2022-11-01 30511, 2022

      • lucifer
        mayhem: i see. uwsgi shouldnt be needed so probably remove it from requirements temporarily and try again.
      • 2022-11-01 30530, 2022

      • mayhem
        yeah, that works.
      • 2022-11-01 30541, 2022

      • mayhem
        now sorting out some other docs weirdnesses
      • 2022-11-01 30517, 2022

      • mayhem
      • 2022-11-01 30529, 2022

      • mayhem
        for instance, the above gets rendered for:
      • 2022-11-01 30542, 2022

      • mayhem
      • 2022-11-01 30503, 2022

      • mayhem
        wtf does the text below the constant appear from??
      • 2022-11-01 30535, 2022

      • CatQuest
        huh so if you guys are interested. i was logged out and got a quick error "playback issue: username unknown"
      • 2022-11-01 30505, 2022

      • CatQuest
        probably better to just not except playback if not logged in?
      • 2022-11-01 30520, 2022

      • CatQuest
        expect*
      • 2022-11-01 30552, 2022

      • mayhem
        oh, I think I get it.
      • 2022-11-01 30544, 2022

      • atj
      • 2022-11-01 30527, 2022

      • atj
        some pretty depressing comments in that
      • 2022-11-01 30535, 2022

      • the4oo4 joined the channel
      • 2022-11-01 30547, 2022

      • mayhem
        "I want to get into that, because it seems like the music industry, the artists and labels, always want more of what is effectively a fixed amount of money. Everyone is only paying so much, and the only way you can make more money for the actual musicians is by raising prices. It seems very difficult to make more money in any other way. You can make more money for the companies — for Amazons and Spotifys — by layering other
      • 2022-11-01 30547, 2022

      • mayhem
        kinds of content like podcasts and audiobooks, but it seems like there is only one lever to actually generate more money for the music industry."
      • 2022-11-01 30526, 2022

      • mayhem
        that about sums up the current state of the industry. its a feeding frenzy for the cents sloshing around and the artists don't have enough cents to put in to collect the cents they are due.
      • 2022-11-01 30553, 2022

      • atj
        "So yes, there is a fixed pool of money in recorded music, but the pool keeps growing, right? It’s fixed per customer, but when you get into areas like merch, there are unlimited amounts that people are willing to spend to connect with their favorite artist and to represent their fandom."
      • 2022-11-01 30527, 2022

      • atj
        translated: you're never going to make enough money from streaming so you need to sell merchandise (preferably via Amazon)
      • 2022-11-01 30553, 2022

      • mayhem
        that is an old trope.
      • 2022-11-01 30506, 2022

      • mayhem
        the other one is "touring, go touring to make money."
      • 2022-11-01 30524, 2022

      • atj
        "At the end of the day, you want to use a music service because it feels like it’s part of music culture." - lol
      • 2022-11-01 30525, 2022

      • mayhem
        except that most artists wear themselves out and dont actually earn all that much.
      • 2022-11-01 30535, 2022

      • mayhem
        pathetic, no?
      • 2022-11-01 30552, 2022

      • atj
        I find it deeply cynical to be honest
      • 2022-11-01 30510, 2022

      • mayhem
        welcome to the music industry.
      • 2022-11-01 30529, 2022

      • elomatreb[m]
        The middle reads like manufacturing consent for locking artists into exclusivity deals with streaming platforms
      • 2022-11-01 30556, 2022

      • atj
        tech industry + music industry = <endless screaming>
      • 2022-11-01 30554, 2022

      • mayhem
        elomatreb[m]: yerp, that is pretty much an industry wet dream.
      • 2022-11-01 30504, 2022

      • mayhem
        everyone must have 100% ownership of everything!
      • 2022-11-01 30511, 2022

      • mayhem
        🙄
      • 2022-11-01 30525, 2022

      • elomatreb[m]
        There were some well-known German artists (e.g. Die Ärzte) that refused to sign up for Spotify and other streaming platforms for the longest time, but even they gave in now
      • 2022-11-01 30532, 2022

      • mayhem
        yeah, see pink floyd and the beatles too.
      • 2022-11-01 30521, 2022

      • lucifer
        mayhem: did your issue get resolved?
      • 2022-11-01 30529, 2022

      • mayhem
        yes, thanks.
      • 2022-11-01 30535, 2022

      • lucifer
        👍
      • 2022-11-01 30547, 2022

      • mayhem
        we'll need to find a better way to deal with the doc requirements.txt, but the art PR is now ready.
      • 2022-11-01 30517, 2022

      • lucifer
      • 2022-11-01 30531, 2022

      • mayhem
        that looks promising!
      • 2022-11-01 30540, 2022

      • BrainzGit
        [sir] 14amCap1712 merged pull request #146 (03master…url-perf): Fix performance regression in indexing url https://github.com/metabrainz/sir/pull/146
      • 2022-11-01 30503, 2022

      • BrainzGit
        [sir] 14amCap1712 merged pull request #147 (03url-perf…rg-perg): Fix performance regression in indexing release group https://github.com/metabrainz/sir/pull/147
      • 2022-11-01 30532, 2022

      • mayhem
        yes, I think the contribution thing helped quite a lot.
      • 2022-11-01 30542, 2022

      • mayhem
        are all the datasets done?
      • 2022-11-01 30554, 2022

      • lucifer
        yes
      • 2022-11-01 30540, 2022

      • mayhem
        this is a very promising development.
      • 2022-11-01 30551, 2022

      • mayhem
        I'll dig into automating the testing of this tomorrow.
      • 2022-11-01 30553, 2022

      • mayhem
        thanks!
      • 2022-11-01 30546, 2022

      • mayhem
        thanks past mayhem
      • 2022-11-01 30549, 2022

      • mayhem
        "IF YOU ARE WONDERING WHY THIS TAKES SO MUCH DISKSPACE, BUT HAVE NO IDEA WHAT IT IS, NOW IS THE TIME TO GET RID OF IT."
      • 2022-11-01 30551, 2022

      • mayhem
        lucifer: alastairp you have large files in your home dir on kiss and kiss was nearly out of diskspace.
      • 2022-11-01 30555, 2022

      • mayhem
        can you please have a look?
      • 2022-11-01 30541, 2022

      • alastairp
        mayhem: cleaned
      • 2022-11-01 30518, 2022

      • lucifer
        cleaned
      • 2022-11-01 30535, 2022

      • lucifer
        can prune old docker images to clean more disk space methinkgs
      • 2022-11-01 30533, 2022

      • lucifer
        mayhem: the artist credit filtering should be if any artist mbid matches or if the entire artist credit matched?
      • 2022-11-01 30500, 2022

      • RetroPunk has quit
      • 2022-11-01 30541, 2022

      • RetroPunk joined the channel
      • 2022-11-01 30539, 2022

      • mayhem
        lucifer: I already pruned.
      • 2022-11-01 30548, 2022

      • mayhem
        lucifer: lets start with any.
      • 2022-11-01 30559, 2022

      • lucifer
        ah cool but i was adding it at spark level.
      • 2022-11-01 30507, 2022

      • lucifer
        👍
      • 2022-11-01 30515, 2022

      • mayhem
        yeah, spark is perfect.
      • 2022-11-01 30519, 2022

      • mayhem
        less data to drag along.
      • 2022-11-01 30544, 2022

      • darkstardevx joined the channel
      • 2022-11-01 30556, 2022

      • darkstardevx has quit
      • 2022-11-01 30520, 2022

      • darkstardevx joined the channel
      • 2022-11-01 30542, 2022

      • chinmay
        lucifer: monkey: I have put LB#2181 for review
      • 2022-11-01 30543, 2022

      • BrainzBot
      • 2022-11-01 30551, 2022

      • lucifer
        chinmay: great thanks. the PR description probably needs to be updated. as the timeline has been added now
      • 2022-11-01 30511, 2022

      • chinmay
        Right.. will do ti
      • 2022-11-01 30513, 2022

      • chinmay
        it
      • 2022-11-01 30513, 2022

      • lucifer
        filters and responsiveness also done
      • 2022-11-01 30518, 2022

      • chinmay
        yeah
      • 2022-11-01 30548, 2022

      • lucifer