if you assume for a minute that this will be available on labs, then you should be able to start thinking about the spark aspects of this, no?
and the date will obviously be adjusted to a time window centered around now()
lucifer
makes sense. i forgot what exactly we intended to do with spark in this. let me reread our previous discussion.
>2. In spark, create a job that downloads this list and then for each user calculates the intersection of their recent discovery data and the artists in the new releases.
mayhem
from discovery tracks distill a list of artists and when the user last listened to a track by that artist.
yeah, that.
but, the output of that query can be used directly by chinmay to display the same data for a "site wide" view. the per user view will just be a smaller.
lucifer
yes makes sense.
so with that query as input we want (artist_mbid, last_listened) as output?
mayhem
no, we want to filter the list and remove all releases that do not contain at least one artist from the discovered track. return the data in the same format as the input.
lucifer
i see.
so we if the user hasn't listened to a track from the release artist we remove that release from that user's view.
mayhem
yes.
and if we're going for bonus points, could we create a confidence score?
lucifer
do we want to restrict the time range like not listened in last 3 months or never?
mayhem
you listened to 1 track by an artist on a release: lowest score. if you listened to a pile of tracks: high score.
make the time range configurable, please.
lucifer
yes should be doable.
mayhem
I think at first we will want to be more lax to draw in more data. but over time we might want to be more constrcting.
I fear that the output will be 1-2 releases for most people, which is not terribly fun to look at.
and if we have too much data to show, we can dial it back\
lucifer
and if the user only listened to 1 artist of the album having multiple artist still include, right?
mayhem
yes, be as greedy in collecting releases as we can to start with.
lucifer
makes sense
mayhem
we can always filter more shit out, esp if we have a confidence score.
lucifer
yes sounds good
mayhem
ok, great.
sorry for being so absent. I soo hope that life returns to some form of normal next week.
alastairp
hullo
lucifer
i also looked at the Oauth btw. i think the smallest unit of testable work is implementing one form of grant. so thinking to implement the one we use with pythonbrainz and test with LB.
alastairp
sorry I missed the meeting yesterday. forgot it was monday!
lucifer
heh np :D
alastairp
lucifer: one form of grant sounds neat
mayhem
lucifer: https://github.com/metabrainz/listenbrainz-serv... on this PR, I forget if we discussed whether the user_setting table will have a JSONB field or individual columns that we will add as we add use options. do you remember?
lucifer
mayhem: iirc we decided to do a mix of those. one column for each type of settings. for example, one jsonb column for all troi related settings. one column for timezone so on.
mayhem
ok, then that PR is spot on, save for the UI being in flask/html rather than react.
good good.
lucifer
mayhem: i see. that page is still in flask so makes sense for it to be in flask for the time being.
hmm, good point. so maybe it doesn't make sense to have an optional dev environment but always run it during tests
OK, leave that for now - let me think about it to see if there's a better way. maybe it does make sense to always have a bb database too. we decided that the MB database was required
ansh
Or we can merge into the feature branch for now. After I add the edition group, then we can merge to master and deploy ?
alastairp
no, that's fine - let's merge and deploy to production. as we said last week it would be great to get small changes merged as quickly as possible