in order to keep the data in the cluster fresh, iliekcomputers is going to work on incremental LB data dumps
pristine__
rdswift: thanks
ruaok
rdswift: that reminds me, I need to respond to an old mesg of yours.
pristine__: the idea is that we can wake up the cluster at any time and then.
pristine__
incremental data dumps, what will be that like
ruaok
1. load incremental data dumps that have been produced since the cluster last woke.
2. calculate whatever we need to. stats: train models, run CF models
rdswift doesn't know what response that might be.
3. Shut down the cluster
which basically means that you do not need to worry about data freshness right now.
that something that iliekcomputers and I will work on.
pristine__
Okay. I just have too many thoughts whilst working .Lol
ruaok
and effectively we just need to create scripts that carry out a task once they are called.
doesn't matter when they are called.
pristine__: good thoughts too. keep bringing them up.
rdswift: > ruaok, pristine__: I just had another thought regarding identifying artist-artist afinity. Similar to ruaok's number of times each artist pair appears on the same compilation album, how about the number of times each artist pair appears in a user's "owned music" collection? Chances are they would only own both if they actually liked both (or at least the tracks or albums on which they appear).
pristine__
Yeah, we should have independent scripts for that.
ruaok
rdswift: yes, that is also a good source of data.
however, I feat that there isn't much data AND we would need to get users permission to "process" them as per GDPR.
which means that it isn't an easy thing to do that will likely drastically improve the data we have.
pristine__
I was just thinking, how are we gonna keep our AAR fresh and updated
rdswift
No response required. I was just brainstorming in case something triggered a better idea. Thanks though.
ruaok
I'm not saying we shouldn't do it, but have lots of low hanging fruit first.
pristine__
as more releases/recordings come out
ruaok
pristine__: that is is nearly done.
I've got a little more work to do, but AAR can re-run on a weekly basis.
1. calculate a new table.
pristine__
and it may happen that an artist changes its affinity to other artists in time
wow
ruaok
2. in a transaction: drop old table, rename new table
3. commit
pristine__: yes, it will. but those changes are going to move very slowly that weekly updates are quite sufficient.
right now I am moving fast and trying to build stuff that allows you to continue.
pristine__
I will someday try to understand the code for AAR, I was reading it one day, and was stuck up but now i don't remember.
weekly sounds good to me
ruaok
towards the end of the summer both you and I will need to spend time "finishing" things so that they are ready for deployment.
I can explain it.
it is actually fairly simple, really.
pristine__
yeah, mentor working as much as the student
<3
thanks :)
ruaok
first it runs a query to that fetches the artists that are on a release and returns release/artists pairs.
pristine__
okay
ruaok
then in memory the python script creates a dict with aritsts-artists MBIDs as they key.
everytime that pair is encountered that count is incremented.
that really it.
the rest is the overhead to flush the data to a table, dropping counts less than 3.
... dropping counts <3
lol.
pristine__
default dict val is 0? to account for single artists in artist_credit?
ruaok
I think i've been staring at screen for too long today.
pristine__
okay.
eyes pain?
ruaok
implied default value is 0, yes.
no, being silly.
brain can't really focus anymore.
pristine__
lol
Cool then, do we anything else to discuss?
I will take care of new artists, empty dataframe, towards the end of month, no?
new users*
ruaok
I don't. I just need to put my head down and work on the MSB mapping.
as you make progress.
pristine__
yeah. New users thing should be handled delicately. I had many thoughts on it today.