#metabrainz

/

      • pristine__
        no
      • 2019-06-12 16307, 2019

      • pristine__
        scroll and see the counts next to "Similar artists for rob"
      • 2019-06-12 16316, 2019

      • ruaok
        top artists for rob is much much slower than iliekcomputers and you. odd.
      • 2019-06-12 16354, 2019

      • pristine__
        this time greatly because we have calculated so mnay counts. I will remove them once we are done with the html
      • 2019-06-12 16310, 2019

      • ruaok
        even those counts make no sense to me.
      • 2019-06-12 16328, 2019

      • ruaok
        iliekcomputers has more listens but half time.
      • 2019-06-12 16339, 2019

      • ruaok
        does that mean that the is a greater number of similar artists in my set?
      • 2019-06-12 16348, 2019

      • pristine__
        yes
      • 2019-06-12 16350, 2019

      • ruaok
        should I shut up and listen for now?
      • 2019-06-12 16310, 2019

      • pristine__
        There is a reason I put up those counts
      • 2019-06-12 16320, 2019

      • pristine__
        Like for instance
      • 2019-06-12 16342, 2019

      • pristine__
        there is this artist "New Order" in your set
      • 2019-06-12 16352, 2019

      • pristine__
        2528 similar artists
      • 2019-06-12 16311, 2019

      • pristine__
        Doesn't makes much sense to include all of them
      • 2019-06-12 16325, 2019

      • ruaok
        yes, I suspect that my similar artist set need to calculate fewer similar artists.
      • 2019-06-12 16327, 2019

      • pristine__
        On basis of count, we can filter top x?
      • 2019-06-12 16348, 2019

      • ruaok
        with a popular older artist like new order, there are many many similar artists since they appeared on many more compliations.
      • 2019-06-12 16359, 2019

      • ruaok
        yes, that is a good way of filtering.
      • 2019-06-12 16310, 2019

      • pristine__
        Also,
      • 2019-06-12 16320, 2019

      • ruaok
        and it would be best to do it on the lb-labs side and not the AAR side.
      • 2019-06-12 16339, 2019

      • pristine__
        How do we do that?
      • 2019-06-12 16351, 2019

      • pristine__
        AAR gives us artist similarity
      • 2019-06-12 16333, 2019

      • ruaok
        no, it gives you a count of how many times two artists appeared on a compilation.
      • 2019-06-12 16349, 2019

      • ruaok
        and on AAR I cut off any counts 2 or less.
      • 2019-06-12 16357, 2019

      • pristine__
        yes, and that way we compute there affinity
      • 2019-06-12 16309, 2019

      • pristine__
        their*
      • 2019-06-12 16311, 2019

      • ruaok
        you should set a higher threshold. or a max number.
      • 2019-06-12 16330, 2019

      • ruaok
        possibly both.
      • 2019-06-12 16334, 2019

      • pristine__
        for count?
      • 2019-06-12 16335, 2019

      • pristine__
        yeah
      • 2019-06-12 16339, 2019

      • ruaok
        yes
      • 2019-06-12 16302, 2019

      • pristine__
        top 10 ?
      • 2019-06-12 16312, 2019

      • ruaok
        156,523 new artists. ouch.
      • 2019-06-12 16326, 2019

      • pristine__
        I was just coming to that
      • 2019-06-12 16328, 2019

      • ruaok
        make it a configuration value and then start with 10.
      • 2019-06-12 16300, 2019

      • pristine__
        yeah, was asking to start with. thanks:)
      • 2019-06-12 16302, 2019

      • ruaok
        I think making a config value for threshold and max count and then doing multiple runs with various figures should be your next step.
      • 2019-06-12 16302, 2019

      • pristine__
        so yeah
      • 2019-06-12 16308, 2019

      • pristine__
        for new artists
      • 2019-06-12 16314, 2019

      • ruaok
        yes
      • 2019-06-12 16346, 2019

      • pristine__
        I have calculated candidate sets on a months data
      • 2019-06-12 16358, 2019

      • pristine__
        so top 20 artists
      • 2019-06-12 16305, 2019

      • pristine__
        and there similar artists
      • 2019-06-12 16322, 2019

      • pristine__
        subtracted from total artists
      • 2019-06-12 16326, 2019

      • pristine__
        for each user
      • 2019-06-12 16333, 2019

      • pristine__
        but what i think is
      • 2019-06-12 16300, 2019

      • pristine__
        for now, We should just focus on generating two playlist, "top artists" and "similar artists"
      • 2019-06-12 16314, 2019

      • pristine__
        because random new artists are no good
      • 2019-06-12 16327, 2019

      • ruaok
        playlists or candidate sets?
      • 2019-06-12 16355, 2019

      • pristine__
        two playlists from two candidate set
      • 2019-06-12 16334, 2019

      • pristine__
        playlist 1: songs from the top artists that you have listened to in last month/week
      • 2019-06-12 16306, 2019

      • pristine__
        playlist 2: songs from the artists similar to top artists that you have listened to in last month/week
      • 2019-06-12 16307, 2019

      • ruaok
        ok
      • 2019-06-12 16318, 2019

      • ruaok
        yes, that seems like a very good goal.
      • 2019-06-12 16323, 2019

      • pristine__
        so, later on we can group new artists
      • 2019-06-12 16334, 2019

      • pristine__
        on basis of nationality, are, genre
      • 2019-06-12 16347, 2019

      • pristine__
        and then build a third candidate set.
      • 2019-06-12 16304, 2019

      • pristine__
        that way we can accomplish our main goal
      • 2019-06-12 16313, 2019

      • pristine__
        of promoting new artists
      • 2019-06-12 16321, 2019

      • ruaok
        oh. feedback! that is an interesting idea.
      • 2019-06-12 16322, 2019

      • pristine__
        keeping in mind the taste of user
      • 2019-06-12 16342, 2019

      • pristine__
        area*
      • 2019-06-12 16329, 2019

      • ruaok
        I think that is an interesting idea. very interesting.
      • 2019-06-12 16337, 2019

      • ruaok
        please pursue that.
      • 2019-06-12 16347, 2019

      • pristine__
        I mean there is a lot of possibility to filter artists from these 156523 artists that you are closer to your taste
      • 2019-06-12 16358, 2019

      • ruaok
        yes.
      • 2019-06-12 16315, 2019

      • ruaok
        I think with the thresholding/max count we should be able to reduce that quite a lot.
      • 2019-06-12 16316, 2019

      • pristine__
        which idea? focusing on two playlists?
      • 2019-06-12 16333, 2019

      • ruaok
        not sure what a good target it, but I'm guessing less than 1000 artists.
      • 2019-06-12 16348, 2019

      • pristine__
        yeah.
      • 2019-06-12 16353, 2019

      • ruaok
        focusing on two playlists so that we can build a third candidate set for a new artist playlist later.
      • 2019-06-12 16322, 2019

      • pristine__
        yeah
      • 2019-06-12 16357, 2019

      • pristine__
        and how often are we going to train our model?
      • 2019-06-12 16308, 2019

      • ruaok
        I have no idea.
      • 2019-06-12 16318, 2019

      • pristine__
        and how often we would generate recommendations?
      • 2019-06-12 16345, 2019

      • ruaok
        that is something we need to learn -- it will be a balance between the computing resources and keeping things fresh.
      • 2019-06-12 16305, 2019

      • pristine__
        because if it is weekly, then we need week wise files in hdfs
      • 2019-06-12 16308, 2019

      • ruaok
        also unknown. I think each of these need to be config values that we can tweak as we go.
      • 2019-06-12 16309, 2019

      • pristine__
        yeah, right.
      • 2019-06-12 16331, 2019

      • ruaok
        did you catch the conversation between iliekcomputers and I earlier this week?
      • 2019-06-12 16343, 2019

      • rdswift
        pristine__: To answer your earlier question, "Owned Music" is one type of user collection available. See https://beta.musicbrainz.org/collection/create
      • 2019-06-12 16353, 2019

      • pristine__
        Umm...about what?
      • 2019-06-12 16301, 2019

      • pristine__
        I don't think so
      • 2019-06-12 16304, 2019

      • ruaok
        in order to keep the data in the cluster fresh, iliekcomputers is going to work on incremental LB data dumps
      • 2019-06-12 16304, 2019

      • pristine__
        rdswift: thanks
      • 2019-06-12 16317, 2019

      • ruaok
        rdswift: that reminds me, I need to respond to an old mesg of yours.
      • 2019-06-12 16336, 2019

      • ruaok
        pristine__: the idea is that we can wake up the cluster at any time and then.
      • 2019-06-12 16336, 2019

      • pristine__
        incremental data dumps, what will be that like
      • 2019-06-12 16351, 2019

      • ruaok
        1. load incremental data dumps that have been produced since the cluster last woke.
      • 2019-06-12 16314, 2019

      • ruaok
        2. calculate whatever we need to. stats: train models, run CF models
      • 2019-06-12 16318, 2019

      • rdswift doesn't know what response that might be.
      • 2019-06-12 16325, 2019

      • ruaok
        3. Shut down the cluster
      • 2019-06-12 16342, 2019

      • ruaok
        which basically means that you do not need to worry about data freshness right now.
      • 2019-06-12 16359, 2019

      • ruaok
        that something that iliekcomputers and I will work on.
      • 2019-06-12 16311, 2019

      • pristine__
        Okay. I just have too many thoughts whilst working .Lol
      • 2019-06-12 16329, 2019

      • ruaok
        and effectively we just need to create scripts that carry out a task once they are called.
      • 2019-06-12 16337, 2019

      • ruaok
        doesn't matter when they are called.
      • 2019-06-12 16343, 2019

      • ruaok
        pristine__: good thoughts too. keep bringing them up.
      • 2019-06-12 16302, 2019

      • ruaok
        rdswift: > ruaok, pristine__: I just had another thought regarding identifying artist-artist afinity. Similar to ruaok's number of times each artist pair appears on the same compilation album, how about the number of times each artist pair appears in a user's "owned music" collection? Chances are they would only own both if they actually liked both (or at least the tracks or albums on which they appear).
      • 2019-06-12 16304, 2019

      • pristine__
        Yeah, we should have independent scripts for that.
      • 2019-06-12 16326, 2019

      • ruaok
        rdswift: yes, that is also a good source of data.
      • 2019-06-12 16349, 2019

      • ruaok
        however, I feat that there isn't much data AND we would need to get users permission to "process" them as per GDPR.
      • 2019-06-12 16309, 2019

      • ruaok
        which means that it isn't an easy thing to do that will likely drastically improve the data we have.
      • 2019-06-12 16321, 2019

      • pristine__
        I was just thinking, how are we gonna keep our AAR fresh and updated
      • 2019-06-12 16328, 2019

      • rdswift
        No response required. I was just brainstorming in case something triggered a better idea. Thanks though.
      • 2019-06-12 16331, 2019

      • ruaok
        I'm not saying we shouldn't do it, but have lots of low hanging fruit first.
      • 2019-06-12 16339, 2019

      • pristine__
        as more releases/recordings come out
      • 2019-06-12 16346, 2019

      • ruaok
        pristine__: that is is nearly done.
      • 2019-06-12 16305, 2019

      • ruaok
        I've got a little more work to do, but AAR can re-run on a weekly basis.
      • 2019-06-12 16311, 2019

      • ruaok
        1. calculate a new table.
      • 2019-06-12 16326, 2019

      • pristine__
        and it may happen that an artist changes its affinity to other artists in time
      • 2019-06-12 16328, 2019

      • pristine__
        wow
      • 2019-06-12 16328, 2019

      • ruaok
        2. in a transaction: drop old table, rename new table
      • 2019-06-12 16331, 2019

      • ruaok
        3. commit
      • 2019-06-12 16355, 2019

      • ruaok
        pristine__: yes, it will. but those changes are going to move very slowly that weekly updates are quite sufficient.
      • 2019-06-12 16318, 2019

      • ruaok
        right now I am moving fast and trying to build stuff that allows you to continue.
      • 2019-06-12 16331, 2019

      • pristine__
        I will someday try to understand the code for AAR, I was reading it one day, and was stuck up but now i don't remember.
      • 2019-06-12 16337, 2019

      • pristine__
        weekly sounds good to me
      • 2019-06-12 16342, 2019

      • ruaok
        towards the end of the summer both you and I will need to spend time "finishing" things so that they are ready for deployment.
      • 2019-06-12 16352, 2019

      • ruaok
        I can explain it.
      • 2019-06-12 16358, 2019

      • ruaok
        it is actually fairly simple, really.
      • 2019-06-12 16314, 2019

      • pristine__
        yeah, mentor working as much as the student
      • 2019-06-12 16316, 2019

      • pristine__
        <3
      • 2019-06-12 16334, 2019

      • pristine__
        thanks :)
      • 2019-06-12 16337, 2019

      • ruaok
        first it runs a query to that fetches the artists that are on a release and returns release/artists pairs.
      • 2019-06-12 16358, 2019

      • pristine__
        okay
      • 2019-06-12 16304, 2019

      • ruaok
        then in memory the python script creates a dict with aritsts-artists MBIDs as they key.
      • 2019-06-12 16325, 2019

      • ruaok
        everytime that pair is encountered that count is incremented.
      • 2019-06-12 16342, 2019

      • ruaok
        that really it.
      • 2019-06-12 16358, 2019

      • ruaok
        the rest is the overhead to flush the data to a table, dropping counts less than 3.
      • 2019-06-12 16306, 2019

      • ruaok
        ... dropping counts <3
      • 2019-06-12 16307, 2019

      • ruaok
        lol.
      • 2019-06-12 16321, 2019

      • pristine__
        default dict val is 0? to account for single artists in artist_credit?
      • 2019-06-12 16328, 2019

      • ruaok
        I think i've been staring at screen for too long today.
      • 2019-06-12 16337, 2019

      • pristine__
        okay.
      • 2019-06-12 16341, 2019

      • pristine__
        eyes pain?
      • 2019-06-12 16342, 2019

      • ruaok
        implied default value is 0, yes.
      • 2019-06-12 16346, 2019

      • ruaok
        no, being silly.
      • 2019-06-12 16352, 2019

      • ruaok
        brain can't really focus anymore.
      • 2019-06-12 16357, 2019

      • pristine__
        lol
      • 2019-06-12 16311, 2019

      • pristine__
        Cool then, do we anything else to discuss?
      • 2019-06-12 16332, 2019

      • pristine__
        I will take care of new artists, empty dataframe, towards the end of month, no?
      • 2019-06-12 16338, 2019

      • pristine__
        new users*
      • 2019-06-12 16341, 2019

      • ruaok
        I don't. I just need to put my head down and work on the MSB mapping.
      • 2019-06-12 16352, 2019

      • ruaok
        as you make progress.
      • 2019-06-12 16340, 2019

      • pristine__
        yeah. New users thing should be handled delicately. I had many thoughts on it today.
      • 2019-06-12 16317, 2019

      • pristine__
        Okay then. See ya tomorrow <3
      • 2019-06-12 16322, 2019

      • pristine__
        All the best :)
      • 2019-06-12 16322, 2019

      • ruaok
        yes. that is called the cold start problem.
      • 2019-06-12 16330, 2019

      • ruaok
        ok, sounds good. I remain excited.