#metabrainz

/

      • pristine__
        no
      • scroll and see the counts next to "Similar artists for rob"
      • ruaok
        top artists for rob is much much slower than iliekcomputers and you. odd.
      • pristine__
        this time greatly because we have calculated so mnay counts. I will remove them once we are done with the html
      • ruaok
        even those counts make no sense to me.
      • iliekcomputers has more listens but half time.
      • does that mean that the is a greater number of similar artists in my set?
      • pristine__
        yes
      • ruaok
        should I shut up and listen for now?
      • pristine__
        There is a reason I put up those counts
      • Like for instance
      • there is this artist "New Order" in your set
      • 2528 similar artists
      • Doesn't makes much sense to include all of them
      • ruaok
        yes, I suspect that my similar artist set need to calculate fewer similar artists.
      • pristine__
        On basis of count, we can filter top x?
      • ruaok
        with a popular older artist like new order, there are many many similar artists since they appeared on many more compliations.
      • yes, that is a good way of filtering.
      • pristine__
        Also,
      • ruaok
        and it would be best to do it on the lb-labs side and not the AAR side.
      • pristine__
        How do we do that?
      • AAR gives us artist similarity
      • ruaok
        no, it gives you a count of how many times two artists appeared on a compilation.
      • and on AAR I cut off any counts 2 or less.
      • pristine__
        yes, and that way we compute there affinity
      • their*
      • ruaok
        you should set a higher threshold. or a max number.
      • possibly both.
      • pristine__
        for count?
      • yeah
      • ruaok
        yes
      • pristine__
        top 10 ?
      • ruaok
        156,523 new artists. ouch.
      • pristine__
        I was just coming to that
      • ruaok
        make it a configuration value and then start with 10.
      • pristine__
        yeah, was asking to start with. thanks:)
      • ruaok
        I think making a config value for threshold and max count and then doing multiple runs with various figures should be your next step.
      • pristine__
        so yeah
      • for new artists
      • ruaok
        yes
      • pristine__
        I have calculated candidate sets on a months data
      • so top 20 artists
      • and there similar artists
      • subtracted from total artists
      • for each user
      • but what i think is
      • for now, We should just focus on generating two playlist, "top artists" and "similar artists"
      • because random new artists are no good
      • ruaok
        playlists or candidate sets?
      • pristine__
        two playlists from two candidate set
      • playlist 1: songs from the top artists that you have listened to in last month/week
      • playlist 2: songs from the artists similar to top artists that you have listened to in last month/week
      • ruaok
        ok
      • yes, that seems like a very good goal.
      • pristine__
        so, later on we can group new artists
      • on basis of nationality, are, genre
      • and then build a third candidate set.
      • that way we can accomplish our main goal
      • of promoting new artists
      • ruaok
        oh. feedback! that is an interesting idea.
      • pristine__
        keeping in mind the taste of user
      • area*
      • ruaok
        I think that is an interesting idea. very interesting.
      • please pursue that.
      • pristine__
        I mean there is a lot of possibility to filter artists from these 156523 artists that you are closer to your taste
      • ruaok
        yes.
      • I think with the thresholding/max count we should be able to reduce that quite a lot.
      • pristine__
        which idea? focusing on two playlists?
      • ruaok
        not sure what a good target it, but I'm guessing less than 1000 artists.
      • pristine__
        yeah.
      • ruaok
        focusing on two playlists so that we can build a third candidate set for a new artist playlist later.
      • pristine__
        yeah
      • and how often are we going to train our model?
      • ruaok
        I have no idea.
      • pristine__
        and how often we would generate recommendations?
      • ruaok
        that is something we need to learn -- it will be a balance between the computing resources and keeping things fresh.
      • pristine__
        because if it is weekly, then we need week wise files in hdfs
      • ruaok
        also unknown. I think each of these need to be config values that we can tweak as we go.
      • pristine__
        yeah, right.
      • ruaok
        did you catch the conversation between iliekcomputers and I earlier this week?
      • rdswift
        pristine__: To answer your earlier question, "Owned Music" is one type of user collection available. See https://beta.musicbrainz.org/collection/create
      • pristine__
        Umm...about what?
      • I don't think so
      • ruaok
        in order to keep the data in the cluster fresh, iliekcomputers is going to work on incremental LB data dumps
      • pristine__
        rdswift: thanks
      • ruaok
        rdswift: that reminds me, I need to respond to an old mesg of yours.
      • pristine__: the idea is that we can wake up the cluster at any time and then.
      • pristine__
        incremental data dumps, what will be that like
      • ruaok
        1. load incremental data dumps that have been produced since the cluster last woke.
      • 2. calculate whatever we need to. stats: train models, run CF models
      • rdswift doesn't know what response that might be.
      • 3. Shut down the cluster
      • which basically means that you do not need to worry about data freshness right now.
      • that something that iliekcomputers and I will work on.
      • pristine__
        Okay. I just have too many thoughts whilst working .Lol
      • ruaok
        and effectively we just need to create scripts that carry out a task once they are called.
      • doesn't matter when they are called.
      • pristine__: good thoughts too. keep bringing them up.
      • rdswift: > ruaok, pristine__: I just had another thought regarding identifying artist-artist afinity. Similar to ruaok's number of times each artist pair appears on the same compilation album, how about the number of times each artist pair appears in a user's "owned music" collection? Chances are they would only own both if they actually liked both (or at least the tracks or albums on which they appear).
      • pristine__
        Yeah, we should have independent scripts for that.
      • ruaok
        rdswift: yes, that is also a good source of data.
      • however, I feat that there isn't much data AND we would need to get users permission to "process" them as per GDPR.
      • which means that it isn't an easy thing to do that will likely drastically improve the data we have.
      • pristine__
        I was just thinking, how are we gonna keep our AAR fresh and updated
      • rdswift
        No response required. I was just brainstorming in case something triggered a better idea. Thanks though.
      • ruaok
        I'm not saying we shouldn't do it, but have lots of low hanging fruit first.
      • pristine__
        as more releases/recordings come out
      • ruaok
        pristine__: that is is nearly done.
      • I've got a little more work to do, but AAR can re-run on a weekly basis.
      • 1. calculate a new table.
      • pristine__
        and it may happen that an artist changes its affinity to other artists in time
      • wow
      • ruaok
        2. in a transaction: drop old table, rename new table
      • 3. commit
      • pristine__: yes, it will. but those changes are going to move very slowly that weekly updates are quite sufficient.
      • right now I am moving fast and trying to build stuff that allows you to continue.
      • pristine__
        I will someday try to understand the code for AAR, I was reading it one day, and was stuck up but now i don't remember.
      • weekly sounds good to me
      • ruaok
        towards the end of the summer both you and I will need to spend time "finishing" things so that they are ready for deployment.
      • I can explain it.
      • it is actually fairly simple, really.
      • pristine__
        yeah, mentor working as much as the student
      • <3
      • thanks :)
      • ruaok
        first it runs a query to that fetches the artists that are on a release and returns release/artists pairs.
      • pristine__
        okay
      • ruaok
        then in memory the python script creates a dict with aritsts-artists MBIDs as they key.
      • everytime that pair is encountered that count is incremented.
      • that really it.
      • the rest is the overhead to flush the data to a table, dropping counts less than 3.
      • ... dropping counts <3
      • lol.
      • pristine__
        default dict val is 0? to account for single artists in artist_credit?
      • ruaok
        I think i've been staring at screen for too long today.
      • pristine__
        okay.
      • eyes pain?
      • ruaok
        implied default value is 0, yes.
      • no, being silly.
      • brain can't really focus anymore.
      • pristine__
        lol
      • Cool then, do we anything else to discuss?
      • I will take care of new artists, empty dataframe, towards the end of month, no?
      • new users*
      • ruaok
        I don't. I just need to put my head down and work on the MSB mapping.
      • as you make progress.
      • pristine__
        yeah. New users thing should be handled delicately. I had many thoughts on it today.
      • Okay then. See ya tomorrow <3
      • All the best :)
      • ruaok
        yes. that is called the cold start problem.
      • ok, sounds good. I remain excited.