#metabrainz

/

      • tn5421 joined the channel
      • supersandro2000 has quit
      • supersandro2000 joined the channel
      • ishaanshah
        Morning!
      • rdswift_ joined the channel
      • rdswift has quit
      • rdswift_ is now known as rdswift
      • thomasross has quit
      • BrainzGit
        [musicbrainz-server] reosarevok opened pull request #1706 (master…eslint-max-len): Eslint fixes: max-len https://github.com/metabrainz/musicbrainz-serve...
      • leonardo has quit
      • leonardo joined the channel
      • diru1100
        Morning 🌄🌞
      • shivam-kapila
        morning
      • d4rkie joined the channel
      • Nyanko-sensei has quit
      • supersandro2000 has quit
      • supersandro2000 joined the channel
      • BrainzGit
        [listenbrainz-server] shivam-kapila opened pull request #1105 (master…patch-1): Fix a typo in recommendation docs https://github.com/metabrainz/listenbrainz-serv...
      • [critiquebrainz] amCap1712 opened pull request #303 (master…recording-entity-support): CB-270: Implemented the reviewal of recordings https://github.com/metabrainz/critiquebrainz/pu...
      • BrainzBot
        CB-270: Add support for reviewing (or atleast rating) more entities https://tickets.metabrainz.org/browse/CB-270
      • BrainzGit
        [listenbrainz-server] vansika merged pull request #1105 (master…patch-1): Fix a typo in recommendation docs https://github.com/metabrainz/listenbrainz-serv...
      • d4rkie has quit
      • Nyanko-sensei joined the channel
      • _lucifer
        yvanzo: do you know who had written the moxy adapters for the search server ? I had some questions regarding it.
      • I was thinking of changing the current adpater for status and primary type to the way i implemented it for packaging. That would fix SEARCH-608 and SEARCH-611 and other some inconsistent json issues. But I wanted to know if this could have some other side effects I am unaware of.
      • BrainzBot
        SEARCH-608: Inconsistent JSON serialization (lookup vs search): track count on CD stubs https://tickets.metabrainz.org/browse/SEARCH-608
      • SEARCH-611: Incorrect content in JSON version of release group search result https://tickets.metabrainz.org/browse/SEARCH-611
      • yvanzo
        _lucifer: no, git blame?
      • _lucifer: about side effects: yes, it can possibly break MusicBrainz Server and API clients.
      • BrainzGit
        [listenbrainz-server] vansika opened pull request #1106 (master…community=post-link): add community post link to recommendation info page https://github.com/metabrainz/listenbrainz-serv...
      • yvanzo
        _lucifer: for example of compatibility issue with MBS, see gh:MBS#1231
      • BrainzBot
        MBS-10421: Update schema_fixup of ReleasePackaging: https://github.com/metabrainz/musicbrainz-serve...
      • yvanzo
        _lucifer: By the way, your packaging patch is scheduled to be deployed on October 19th, see https://blog.metabrainz.org/2020/09/21/musicbra...
      • _lucifer
        yvanzo: yeah git blame forgot. its Mineo.
      • Mineo: ping
      • yvanzo: regarding the update, nice :)
      • are similiar changes in place for the tickets i referred above?
      • because i can probably fix the search server side issues but i am not familiar with the musicbrainz server
      • d4rkie joined the channel
      • Nyanko-sensei has quit
      • ruaok
        pristine___: I spent quite a lot of time thinking about our next steps. do you have a minute?
      • pristine___
        ruaok: hey
      • ruaok
        first point is kinda easy: candidate sets are nothing more than lists of recordings right? do they need anything else before we run them through the model?
      • pristine___
        Not really.
      • > first point is kinda easy: candidate sets are nothing more than lists of recordings right?
      • Yes
      • >do they need anything else before we run them through the model?
      • No, not really :)
      • ruaok
        ok, because in the community post we're a bit too focused on the two canidates sets (top, similar). we should open it up and let others make candidate sets.
      • pristine___
        Right.
      • ruaok
        how much work would it be for you to read a candidate set from an API call?
      • because we could... take troi and slap an API on it.
      • pristine___
        (But the recordings in the candidate set should be in training data, otherwise model will discard them)
      • ruaok
        and then a troy patch becomes a candidate set.
      • > (But the recordings in the candidate set should be in training data, otherwise model will discard them)
      • pristine___
        > how much work would it be for you to read a candidate set from an API call?
      • ruaok
        ah! that is the critical step I needed to know.
      • pristine___
        Like a list of [user id/name, recording_id]
      • Is that what you mean?
      • ruaok
        no, simpler.
      • Zastai joined the channel
      • v6lur_ joined the channel
      • I want to make "all recordings in MB tagged with 'punk' " into a candidate set.
      • and then feed it into CF, so we can make a "recommend me the tracks I would like out of all tracks tagged with 'punk'" alg
      • Zastai
        heads up: someone has attached a .mp3 file to https://tickets.metabrainz.org/browse/LB-383 for no discernable reason, and I don't have the privileges required to remove it
      • BrainzBot
        LB-383: Allow updating usernames when they're changed in MusicBrainz
      • pristine___
        Oh shit. I read something else. I have to read the Candidate sets, cool. I will need the recordings and the user they are associated with.
      • > and then feed it into CF, so we can make a "recommend me the tracks I would like out of all tracks tagged with 'punk'" alg
      • Sounds wow!
      • ruaok
        exactly. how can we make that work?
      • Zastai
        the user that did it seems to have done other bogus stuff too (like creating a "Boogie" instrument request and then adding it existing Epics)
      • _lucifer
        that sounds more like content based filtering instead of collaborative filtering
      • pristine___
        Umm... But there should be some criteria for these Candidate sets, then we can write an api using troi. For example, user x listened to lot of sufi genre, so make a candidate set of all sufi tracks in MB
      • Something like this?
      • ruaok
        Zastai: deleted.
      • pristine___
        Maybe
      • ruaok
        pristine___: let me read the docs you wrote on the CF spark stuff and refresh my mind
      • pristine___
        Okay. Ping me.
      • yvanzo
        _lucifer: I don’t think similar changes to MBS have been made for other tickets, but I can handle them.
      • _lucifer
        yvanzo: great!, I'll make the changes and submit a PR soon and try to have a dicussion with Mineo on the same if possible
      • yvanzo: on a side note, are there any long term plans to move the server from perl to some other language?
      • Zastai
        ruaok: there seems to be no option in Jira to report a user for bad behaviour; might be useful for cases like this
      • Gore joined the channel
      • yvanzo
        _lucifer: long long term, too long term to make it a solid plan.
      • ruaok
        pristine___: I think I will try and make a data flow graph of our system soon. a high level view is really needed for someone to come up to speed.
      • _lucifer
        oh ok, đź‘Ť
      • ruaok
        pristine___: at the most fundamental level a candidate set is: (recording, listen_count, user) yes?
      • Gazooo794 has quit
      • Gazooo794 joined the channel
      • pristine___
        ruaok: we need (recording, listen_count, user) to train a model, a candidate set is just (recording, user), the model then assigns rating/score to this tuple i.e (recording, user)
      • Mineo
        _lucifer: it's easier to answer your questions if you not only ping me, but also ask the questions :-) I might not be able to answer them immediately, but I'll try to answer them when I find the time
      • ruaok
        > we need (recording, listen_count, user) to train a model, a candidate set is just (recording, user), the model then assigns rating/score to this tuple i.e (recording, user)
      • _lucifer
        Mineo: sure, will keep it in mind :). I wanted to know if we could something like https://github.com/metabrainz/mb-solr/pull/37/f... instead of the Status adapter
      • ruaok
        this comment right here needs to be at the very top level of the docs someplace. this one encapsulates everything we need.
      • but there is something missing, no?
      • pristine___
        ?
      • ruaok
        all tracks in the candidate set must be in the data set that trains the model, no?
      • pristine___
        Yeah
      • ruaok
        ok, can you re-write the comment to include this restriction?
      • pristine___
        Which comment?
      • ruaok
        > we need (recording, listen_count, user) to train a model, a candidate set is just (recording, user), the model then assigns rating/score to this tuple i.e (recording, user)
      • Mineo
        _lucifer: by status adapter, do you mean https://github.com/metabrainz/mb-solr/blob/mast... ? if so, the answer's a solid maybe! I don't know anything about that class
      • pristine___
        We need (recording, listen_count, user) to train a model, a candidate set is just (recording, user), the model then assigns rating/score to this tuple i.e (recording, user). Note that all tracks and users in the candidate set must be in the training set otherwise they will be discarded by the model.
      • _lucifer
        yeah Mineo i mean that
      • pristine___
        ruaok: ^
      • _lucifer
        Mineo: sorry, my bad did git blame on the wrong file :(, I did it on https://github.com/metabrainz/mb-solr/blame/mas... .
      • ruaok
        pristine___: ok, thanks. that needs to go into the docs someplace.
      • Mineo
        _lucifer: no problem at all :)
      • _lucifer
        Mineo: but otherwise do you have any suggestions how to approach this ? My main issue with doing it using a adapter was how to add two fields
      • ruaok
        pristine___: so we could build this: Given a giant list of recordings that are tagged with "punk", we can iterate over all the listens in LB and discard recordings not tagged with punk. we then build a training set and a candidate set from this set of punk recordings. and then we have a punk music collaborative filter.
      • is that right?
      • pristine___
        Yeah, sgtm.
      • ruaok
        because we can crowd source these kinds of data sets pretty easily.
      • we just need to build this extra filtering step.
      • pristine___
        Hmm
      • > because we can crowd source these kinds of data sets pretty easily.
      • Mineo
        _lucifer: unfortunately not. you'll need to read the eclipselink moxy documentation for that
      • pristine___
        ruaok: this is kinda a different project from *tracks you might like*?
      • _lucifer
        Mineo: ok np. one last question, was there any particular reason for using jaxb over something jackson?
      • *like jackson
      • ruaok
        pristine___: its more of the same, no?
      • right now I feel that the candidates sets that we are creating are not allowing for enough diversity.
      • Mineo
        _lucifer: other than "I could copy the adapters from https://github.com/metabrainz/search-server/tre...;, no
      • _lucifer
        ah, ok :D thanks for the help
      • pristine___
        ruaok: that's because we don't have enough data/recordings in LB for top/similar artists
      • For example
      • ruaok
        exactly. the weakness is the similar/top artists data, not the CF or listens
      • we should work to improve those, for sure, but at the same time being able to easily create more models from crowd sourced data sets would be very very useful.
      • could this filtering of listens be done easily in spark?
      • pristine___
        But we have to have some artists or some metric to choose the recordings, that was the idea behind candidate set back in 2019.
      • > we should work to improve those, for sure, but at the same time being able to easily create more models from crowd sourced data sets would be very very useful.
      • Right, that's why is said, they are kinda two different project/things
      • Improved mapping and a big data dump of listens will really improve the playlists
      • > could this filtering of listens be done be easily in spark?
      • I think so
      • ruaok
        lets talk about the mapping next.
      • pristine___
        And then I want to talk about some similar comments on the post
      • Cool
      • ruaok
        I think I had a import realization yesterday.
      • important.
      • pristine___
        what you are talking about is like giving users playlist of specific genre or something like that, no?
      • ruaok
        pristine___: yes, exactly.
      • pristine___
        I understand that!
      • I think
      • ruaok
        it may be a different project, but its using the same back-end.
      • ok, to the mapping now.
      • pristine___
        It should be a separate feature all together, like a seperate tab next to top artist, similar artist, I mean I don't it to be an alternative to top/similar artist
      • ruaok
        yes, sure.
      • pristine___
        The playlist looks not so good because of two reasons, one is mapping, the other is, ummm.... You remember how we train data?