right, that's an interesting idea. it certainly depends on how we want to present the information to users
2020-11-10 31513, 2020
alastairp
"here is a list of artists that are similar to what you listen to", "here are artists similar to what you have been listening to in the last month", "here is a playlist of tracks to listen to that you haven't heard before"
2020-11-10 31528, 2020
pristine___
So what I mean is, for one week, you will see the same recs, if you refresh your page because there is only one row for you in the db. The next week when you refresh your recs will change and the last updated too
2020-11-10 31546, 2020
pristine___
> "here is a list of artists that are similar to what you listen to", "here are artists similar to what you have been listening to in the last month", "here is a playlist of tracks to listen to that you haven't heard before"
2020-11-10 31502, 2020
pristine___
Yeah, though at this point I am not really sure about what we need
2020-11-10 31512, 2020
pristine___
So I have just tried to keep it simple
2020-11-10 31519, 2020
alastairp
in fact, perhaps that's a good starting point. We haven't actually talked about this yet
2020-11-10 31527, 2020
alastairp
and it should be the first thing in the document
2020-11-10 31539, 2020
alastairp
what is our end goal? Why do we need this? What are we creating?
2020-11-10 31555, 2020
alastairp
could you add some ideas to the beginning of the document? It doesn't have to be complete
2020-11-10 31511, 2020
pristine___
Sure
2020-11-10 31514, 2020
alastairp
you've mentioned a few interesting ideas already
2020-11-10 31521, 2020
alastairp
others can join in if they want
2020-11-10 31524, 2020
pristine___
:)
2020-11-10 31516, 2020
alastairp
oh, I just saw your specification of the `artist` column. that's OK. you should be clearer if you want to just have an artist mbid, or just a credit id, or both
2020-11-10 31524, 2020
alastairp
and I guess you might have a few too many levels of objects
2020-11-10 31508, 2020
alastairp
you have {'artist': [...stuff...]}, but I think you can get away with [...stuff...]. the column name needs to be more descriptive too, this should say that they are _recommendations
2020-11-10 31542, 2020
alastairp
oh, here are 2 more ideas for this table: a list of the artists that were passed into the model, and an indication if the user has listened to the resulting artists before or not
2020-11-10 31557, 2020
alastairp
API endpoint is fine.
2020-11-10 31500, 2020
pristine___
I think we just need a unique identifier for an artist which we use later to do a look up for data, so artist mbid is good?
2020-11-10 31516, 2020
alastairp
if you are able to get an mbid at this point, that's fine
2020-11-10 31534, 2020
alastairp
what if it's a credit? do you break it down into many artist mbids?
2020-11-10 31549, 2020
pristine___
No. Just the artist mbid, the list. I am okay with credit id too. I think ruaok can help us here, on how he intend to use these recs and therefore in what format he needs tha data
2020-11-10 31519, 2020
alastairp
this is where it will be useful to have a clearer understanding of the input data
2020-11-10 31520, 2020
pristine___
> oh, here are 2 more ideas for this table: a list of the artists that were passed into the model, and an indication if the user has listened to the resulting artists before or not
2020-11-10 31548, 2020
alastairp
because we know that we can only get out the same type of data that we put in
2020-11-10 31554, 2020
ruaok
pristine___: the goal of the aritst-artist CF is to replace the ac-ac-relations.
2020-11-10 31516, 2020
ruaok
so everything needs to be artist_credit, not just artist.
2020-11-10 31521, 2020
alastairp
your artists_df dataframe has an artist credit id, and mbids for each artist in that credit, and a textual name?
2020-11-10 31530, 2020
pristine___
What if we filter the artist listened to by the user in the last week in spark and send the resultant (recommended but not listened to in the lastbweek)
2020-11-10 31542, 2020
pristine___
Rn the plan it to generate recs weekly
2020-11-10 31557, 2020
pristine___
alastairp: yes
2020-11-10 31505, 2020
alastairp
so it seems like during the input stage you're going to take a listen, convert it to an acid, and then that'll be the core input in the model. that means the output of the model will also be an acid.
2020-11-10 31534, 2020
alastairp
you can decide if it makes sense to convert that back to a string and artist mbids as part of the model lookup, or later in listenbrainz
2020-11-10 31507, 2020
pristine___
Cool, I think it makes sense to send over (artost_credit_id, score) as of now
2020-11-10 31518, 2020
alastairp
pristine___: right, once you have output from the model in spark, we should do some post-processing to see if the user has listened to that artist before
2020-11-10 31523, 2020
alastairp
and add that to the response
2020-11-10 31547, 2020
alastairp
this is where having a list of use-cases will be useful. so we can see what data we need to return
2020-11-10 31556, 2020
pristine___
Agreed!
2020-11-10 31509, 2020
alastairp
cool
2020-11-10 31515, 2020
pristine___
> And add that to the response
2020-11-10 31521, 2020
pristine___
So there are two options here
2020-11-10 31544, 2020
pristine___
1. Just remove the artist listened to by the user in the last week and send the remaining artist over the queue
2020-11-10 31544, 2020
pristine___
2. Flag that these artists which are a part of the recs were listened to by the user in the last week and send over the queue
I strongly recommend that we flag them, rather than remove them
2020-11-10 31514, 2020
alastairp
it allows us to do more things with the data later
2020-11-10 31522, 2020
pristine___
I agree.
2020-11-10 31529, 2020
pristine___
We can maybe open a ticket
2020-11-10 31553, 2020
pristine___
And do the same with recording recs. Rn we are removing them. More the data, better it is
2020-11-10 31553, 2020
alastairp
sure
2020-11-10 31506, 2020
alastairp
your timeline needs a lot more work. this task can be very easily broken down into multiple steps so it would be good to estimate each of these individually.
2020-11-10 31527, 2020
pristine___
Yea, I wasn't sure about the timeline then.
2020-11-10 31538, 2020
pristine___
Breaking into multiple steps sounds good
2020-11-10 31544, 2020
pristine___
But what is your estimate?
2020-11-10 31556, 2020
pristine___
How long should it take?
2020-11-10 31516, 2020
alastairp
you should also add if you expect to be blocked by something - e.g. "this item has to be reviewed and merged and released before I can move on to the next thing"
2020-11-10 31546, 2020
pristine___
Right
2020-11-10 31558, 2020
alastairp
maybe you will be able to work on some things in parallel - e.g. you could work on the listenbrainz code while you're waiting for someone to merge the spark code
2020-11-10 31531, 2020
alastairp
yes, I mean time estimate
2020-11-10 31522, 2020
pristine___
I was asking, according to you how much time this work should take?
2020-11-10 31549, 2020
alastairp
honestly, I have no idea. I don't know how much time you have to dedicate to this, I don't know how the spark system works, or how long it takes to do a review and deploy to test something
2020-11-10 31505, 2020
alastairp
I think with your experience in the recording recommendation, you should be able to make a good estimate for each part
2020-11-10 31537, 2020
pristine___
Cool
2020-11-10 31521, 2020
pristine___
So I will be moving to Berlin in a few days, that can delay the work, other than that I think 3-4 weeks.
2020-11-10 31523, 2020
alastairp
you've also added some APIs and tables to listebrainz, so you should have a pretty good idea of how long that has taken you in the past
2020-11-10 31533, 2020
pristine___
Right
2020-11-10 31542, 2020
alastairp
cool! of course, you don't need to work on the plane :)
2020-11-10 31544, 2020
shivam-kapila
Oh coding on plane is funnnn
2020-11-10 31536, 2020
pristine___
Haha
2020-11-10 31504, 2020
pristine___
So we done on artist recs, alastairp ?
2020-11-10 31553, 2020
alastairp
I guess so
2020-11-10 31535, 2020
pristine___
Nice.
2020-11-10 31558, 2020
pristine___
Let's discuss the feedback stuff?
2020-11-10 31503, 2020
alastairp
I'm going to lunch now, and I have other plans for the afternoon
2020-11-10 31512, 2020
alastairp
do you have time another day for the feedback?
2020-11-10 31542, 2020
pristine___
Sure. Maybe tomorrow? Or day after?
2020-11-10 31553, 2020
alastairp
tomorrow is OK. same time
2020-11-10 31557, 2020
v6lur_ joined the channel
2020-11-10 31508, 2020
alastairp
Mr_Monkey: will you be around 12:00 tomorrow?
2020-11-10 31516, 2020
pristine___
And it will be good if you could have a look at #1149. For reference before meeting
yvanzo: hi, I'm looking at python image PR, perhaps easier to talk here, because I have a handful of questions
2020-11-10 31551, 2020
alastairp
it looks like you are looking for an image labeled `metabrainz/python:x.y`, and if it exists, look for the created date, extract that out, and re-tag the version as `x.y-date.seq`
2020-11-10 31545, 2020
alastairp
my initial feeling is that this is too complex for such a simple task. Why can we not just use `date` to get the current date, and use that as the date tag in the version?
2020-11-10 31511, 2020
alastairp
we update these images very rarely. therefore I think that 95% of the time we're going to update all of them at once
2020-11-10 31536, 2020
alastairp
I think that we should have no `:x.y` versions. Only `:x.y-date.seq`
alastairp: 14400000 records inserted into a typesense index. time to see how well it works on real time data.
2020-11-10 31536, 2020
alastairp
nice
2020-11-10 31547, 2020
shivam-kapila
Wow the website
2020-11-10 31525, 2020
niceplace has quit
2020-11-10 31502, 2020
yvanzo
alastairp: we have :x.y already, we should probably not remove them as they are in use.
2020-11-10 31538, 2020
yvanzo
Do you want to stop updating them?
2020-11-10 31506, 2020
alastairp
I found your commit message where you explained what tags will be pushed. now I understand why you make each of the tags
2020-11-10 31534, 2020
alastairp
yes, my proposal would be to keep :x.y until we move all projects to a specific tag with a date, and then delete them
2020-11-10 31507, 2020
yvanzo
This is common practice (at Docker Hub) to update the version x to the latest x.y and so on.
2020-11-10 31535, 2020
yvanzo
(This is why I made these tags as explained.)
2020-11-10 31535, 2020
alastairp
yes, I see
2020-11-10 31535, 2020
alastairp
OK, it's not too much of a problem. I think we should explain this in more detail in a readme (instead of just in the commit message), and also add to the readme a recommendation for what tag to use
2020-11-10 31532, 2020
yvanzo
Right, I will update the README.md too then.
2020-11-10 31553, 2020
alastairp
my main comment was that it seems like a lot of code for something that we only run one time every 2 years :)
2020-11-10 31505, 2020
alastairp
but as ruaok mentioned to me yesterday, it's great to have solid tools
2020-11-10 31527, 2020
yvanzo
alastairp: also "situations where we build such an image more than once in a day" are not frequent but should be made easy to deal with because it's often an emergency.
2020-11-10 31551, 2020
yvanzo
It already happened for 3.6 btw.
2020-11-10 31503, 2020
alastairp
yes, exactly
2020-11-10 31511, 2020
alastairp
in fact, that's my next task
2020-11-10 31529, 2020
alastairp
we have 2 base images with 2 different versions of consul, that work differently
2020-11-10 31527, 2020
alastairp
I believe that's why there are the 2 versions, the earlier one has a newer version of consul, and we had to rebuild later with an older consul
2020-11-10 31539, 2020
yvanzo
You mean 2 different flavors then?
2020-11-10 31552, 2020
alastairp
not intentionally
2020-11-10 31506, 2020
alastairp
eventually after my work we will only have the new version
2020-11-10 31530, 2020
yvanzo
I did not take this possibility into account when writing the script.
2020-11-10 31548, 2020
alastairp
don't worry about it. we should not support both flavours
2020-11-10 31514, 2020
alastairp
my plan is to (for example) have 20201110 with -oldconsul, and then 20201115 with -newconsul, after I upgrade downstream projects
2020-11-10 31520, 2020
alastairp
and then this will stop being a problem
2020-11-10 31506, 2020
yvanzo
I can possibly add an optional argument to support appending such slug to the tag?
2020-11-10 31529, 2020
alastairp
I would prefer not to. I think it's additional complexity that we are going to remove soon anyway
MTG work using acousticbrainz mood data + discogs genre links
2020-11-10 31546, 2020
alastairp
"find things in genre x that evoke mood y, and are kind of close to each other"
2020-11-10 31520, 2020
yvanzo
alastairp: Your simplified tagging scheme iiuc: just push 'x.y-date' (common case); If it already exists, push 'x.y-date.increment' instead. Never override existing tags. Remove them by hand once unused.
2020-11-10 31534, 2020
ruaok
alastairp: neat!
2020-11-10 31538, 2020
yvanzo
well, maybe I should just stop rethinking this since it works already.
2020-11-10 31536, 2020
alastairp
yeah, I think that clear documentation is the most important thing at the moment
2020-11-10 31530, 2020
alastairp
but yes, in my view I think that what you just described would be clearer, but I don't feel strongly that it must be changed
2020-11-10 31516, 2020
alastairp
ruaok: it's a perfect usecase for another element. I'm sure I told them to use solr, but they decided to use ES :) I remember that you and I talked about having direct search for AB data in addition to the annoy stuff - "give me everything that matches x and y"
2020-11-10 31541, 2020
alastairp
I'll see if we can improve it and release it on bono
2020-11-10 31504, 2020
reosarevok
" [#metabrainz] Welcome to #MetaBrainz! This channel has taken over from #musicbrainz-devel, #bookbrainz, #bookbrainz-devel, and #musicbottle - so don't despair if you don't know how you ended up here. Just sit down, have a cup of tea, put your feet up, and feel right at home. :)"
2020-11-10 31514, 2020
reosarevok
Given we have #bookbrainz again, maybe it's time to change that?