#metabrainz

/

      • ruaok
        thats a really weird way of thinking about that.
      • for me, I'd prefer just a some parametric score of sorts.
      • now, what part of that do we need to work on, iliekcomputers ?
      • yvanzo
        iliekcomputers: great, no need to know about SIR, at least you are used to Python :)
      • ruaok
        but that score is literally the point of the collaborative filtering algorithm, no?
      • iliekcomputers
        i am not sure yet, but a better metric is probably used in reality. i'll have to take a look.
      • ferbncode
        Iliekcomputers: sure \o/
      • ruaok
        ok, I for now I'm just going to work with the idea that we have some score where higher is better.
      • iliekcomputers
        ruaok: I was thinking that making it predict numbers from 1 to 1000s is harder than making it predict some other metric of likeability.
      • ruaok
        which is not to say that we should make that into a playlist directly. I doubt that will turn out well.
      • ah, ok, now I understand.
      • ok, we clearly need to research what metrics are doable.
      • but this is where aidanlw17's work comes in.
      • Slurpee joined the channel
      • Slurpee has quit
      • Slurpee joined the channel
      • if we say, pick the most played track of the last week, and then find CF recommended tracks that are similar, we can start constructing a playlist.
      • chaining along from track to track that is similar.
      • pristine__
        What other metric can we have apart from listen counts? The more I play a song, the more I like it.
      • aidanlw17
        ruaok what is CF?
      • ruaok
        pristine__: I am not sure. this is precisely what we need to learn.
      • iliekcomputers
        aidanlw17: collaborative filtering.
      • aidanlw17
        oh thank you!
      • iliekcomputers
        pristine__: everything will need to be based on listen counts.
      • ruaok
        and also, I want to reiterate the ONE GOAL I had that caused me to start MusicBrainz.
      • I wanted to pick a starting track and an ending track and give a duration.
      • iliekcomputers
        the thing is that predicting listen counts (which have a large range 1 to tens of thousands) is a harder problem than we need to solve probably.
      • ruaok
        start with enter sandman from metallica and end up with orinoco flow from enya in 2 hours. GO.
      • Mr_Monkey
        The longest i've listened to a track for probably also indicates my tastes, those that are cemented
      • pristine__
        Exciting
      • ruaok
        so, finding a line of similar tracks that go from one track/artist to another track/artist.
      • pristine__
        Mr_Monkey: yup
      • ruaok
        THIS, believe if it or not is why I started MusicBrainz. without MB, this is impossible.
      • iliekcomputers: what does the CF algorithm spit out currently as its ranking?
      • or is that a black box, based on the fact that we're shoving in listen counts?
      • iliekcomputers
        ruaok: we give it listen counts and it tries to predict listen counts as a result.
      • ruaok
        where is the problem in that?
      • is it not doing a good job?
      • iliekcomputers
        maybe i'm not able to explain my thoughts on this correctly.
      • let me do some research and come back with a good paragraph or two.
      • ruaok
        ok, likely I am being dense too.
      • pristine__
        It is doing a good job, maybe iliekcomputers wants a diff metric
      • Diff from listen count.
      • ruaok
        but what you are saying is exactly what I've been understanding, so I am not understand the crux of the problem you're raising.
      • well, if it ain't broke, don't fix it.
      • perhaps it is suitable for the first round.
      • iliekcomputers
        hmm, yep.
      • ruaok
        In reality I think we're going to do this challenge in the autumn and then realize "oh crap, we need this data set, that data set, this, that".
      • learning is the key goal of the challenge.
      • and then we start the cycle again.
      • and perhaps at the end of the second cycle we'll have something to be proud of.
      • ruaok is managing expectations
      • aidanlw17: any thoughts from you?
      • have you thought about how to extend your resultant data to artstsis?
      • pristine__
        "managing expectations"....awww
      • ruaok
        pristine__: yes, we're all working hard to get things done, but the reality is that the first pass is not going to be glorious.
      • if it teaches us how to do better, than I am 100% satisfied.
      • pristine__
        Well said. Learning is the key :)
      • ruaok
        ding.
      • pristine__
        Dong
      • ruaok
        iliekcomputers: the work we've done for shuffling user stats data back to hetzner.... can we use that to shove the recommendations from CF back to hetzner too?
      • aidanlw17
        I'd like to really review the files from pristine__ and the CF project as a whole to get a better understanding of this recommendation work. One thought of mine is that alastairp and I currently will be using 12 separate metrics for track-track similarity, then near the end of the summer a goal is to bring these together into one track-track metric for overall similarity. I think in the end, a combination of this metric and pristine__'s
      • results would give a good dataset for recommendation.
      • iliekcomputers
        ruaok: yes.
      • shouldn't be much work.
      • ruaok
        that then begs the question: how to we handle new runs of the CF data?
      • do we keep X data sets and run a new one once a week?
      • iliekcomputers
        that is what i was expecting.
      • ruaok
        iliekcomputers: great. that will clearly be the next step for pristine__
      • iliekcomputers: <3
      • pristine__
        hetzner?
      • ruaok
        and then we can make playable lists on lb.org -- once we have that, then we're at a point when we can realistically see how the CF alg is performing.
      • pristine__
        Lb-server?
      • ruaok
        pristine__: yes.
      • pristine__
        Oh. Okay.
      • iliekcomputers
        hetzner == leader.listenbrainz
      • pristine__
        I like the next step 😆
      • ruaok
        and I guess there we ought to post process it into, recommendations of things that people have played and recommendations for things that are new to users.
      • iliekcomputers: actually in this case I mean hetzer = lemmy
      • iliekcomputers
        ooh
      • ambiguous. :P
      • aidanlw17
        ruaok: In terms of artist-artist similarity, I think we need these two projects in combination - given that artists may also diverge greatly in the types of music they create, I don't anticipate that only track-track similarity would provide a strong recommendation artist-artist. When bringing in the listen counts from pristine__, I would be interested in seeing how artist-artist recommendation could change.
      • ruaok
        aidanlw17: yes, and I think part of our challenge might be to pick different better metrics that feed your algorithm.
      • perhaps we should make samples of track-track similarities available for public inspection asap too.
      • aidanlw17: I think that is spot on.
      • aidanlw17
        Yeah. alastairp and I also were planning to make a public evaluation available for track-track similarity as soon as we have a working pipeline
      • ruaok
        I'd like all of use to start thinking about how to accomplish the artist-artist data set from the LB and AB datasets.
      • aidanlw17: superb
      • ok, I think we all have a better understanding of next steps and more of the roadmap now, yes?
      • if something is unclear, ask now.
      • pristine__
        Yes yes.
      • alastairp
        iliekcomputers: thanks for starting the script. how's it going?
      • ruaok
        iliekcomputers: I'd live to hear more about your reservations about the metric/ranking for CF when you come by them.
      • ruaok waves at alastairp
      • alastairp
        hi. I'm just reading backlog, and cooking too
      • aidanlw17
        I'll keep that in mind. Additionally, if you guys produce a metric from the collaborative filtering it might be possible to index that with annoy as we will do with the other metrics for track-track. Is that something you want to consider?
      • iliekcomputers
        ruaok: let me try to rephrase what i was saying.
      • ruaok
        aidanlw17: that does sound interesting yes.
      • iliekcomputers
        right now, we're trying to predict exactly how many times you would / should have listened to a particular song (say the strokes' last nite)
      • this value can range from one to tens of thousands.
      • ruaok
        I am super eager to learn from comes from your project. pristine__ has done an excellent job doing that for me on the CF front.
      • iliekcomputers
        so it is hard to predict.
      • ruaok
        too granular?
      • iliekcomputers
        when in reality, we probably do not need that number to that degree of accuracy.
      • pristine__
        ruaok: thanks. Means a lot :)
      • ruaok
        :)
      • iliekcomputers
        a lesser range would probably work out as well (intuition, not sure)
      • ruaok
        iliekcomputers: and the scale of the CF ranking? is that linear or non-linear?
      • aidanlw17
        ruaok: I appreciate the excitement - I feel it too.
      • ruaok
        well, mapping the giant range into something smaller is easy.
      • premature quantization might become a problem.
      • pristine__
        Yes. We can probably normalize.
      • ruaok
        normalizing makes sense to me. quantizing gives me hesitation.
      • iliekcomputers
      • alastairp
        cool! that's really fast
      • ruaok
        I see how quantizing the data might be useful for other algs down the line, but for starters we may not want to do that.
      • alastairp, iliekcomputers : what script is that?
      • alastairp
        I'm not surprised... the original method took about 10 minutes for me to do it on a slow machine with only 4m tracks, but that blocked the whole table. this one is better
      • ruaok: writing submission offsets to the ll table
      • ruaok
        ah, yes.
      • alastairp
        tomorrow we can deploy write offset on submit
      • ruaok
        are submission offfsets monotonically increasing numbers?
      • alastairp
        yes
      • pristine__
        I guess we should continue with the road map and pick on normalization sometime later.
      • ruaok
        makes sense.
      • pristine__: yes.
      • alastairp
        it's the same as we're currently using in the GET endpoint
      • ruaok
        once we see the scores in the report (soon, I hope!) we can get our heads around this more.
      • alastairp
        uuid/low-level?n=[offset]
      • iliekcomputers
        hmm.
      • pristine__
        By tomorrow ruaok :)
      • iliekcomputers
        we should start merging PRs soon too.
      • ruaok
        wooo
      • alastairp
        iliekcomputers: when are you next available?
      • pristine__
        iliekcomputers: could you look at 21
      • ruaok
        the stats PRs should be merged asap, IMHO.
      • iliekcomputers
        alastairp: tomorrow works for me.
      • pristine__
        Can*
      • PR#21
      • ruaok
        my goal for today is to look at pristine's latest PR
      • (aside from boring nonprofit work)
      • pristine__
        I will send you link, ruaok
      • alastairp
        ok, good. perhaps then we can do the next PR on this offset stuff (if we do it early in the morning perhaps we can do the last part in the evening)
      • ruaok
        #26 is on my list.
      • alastairp
        and also we could take a look at the docker stuff that you were finishing up
      • pristine__ telling her laptop to wake up.
      • iliekcomputers
        alastairp: ok.
      • AfroThundr|main has quit
      • ruaok: do we wanna talk some about azure?
      • ruaok
        sure.