(even if we just have our own stuff I think that kind of display, with inviting image previews, would work well for an ‘explore’ section)
monkey: Hey, the bubbles are looking good already!! Some font tweaks, a tighter crop, and options to add a album cover, and I would think it’s in the bag?
mayhem, lucifer: about generated playlists, are listens of the same weight? -> If I listen a generated playlist, does Troi think I like songs in it and then tends to suggest more of the same trend (not sure I'm clear).
lucifer
zas: yes, all listens are of same weight. if you listen a generated playlist, it would indeed think that you like it and suggest more but iirc we take 6 months of listens, so a few playlists are unlikely to affect the overall results.
zas
mayhem: on the tracklist you linked, since it is based on top recordings and my top recordings are somehow bugged (some are very high in list because of a spotify bug at some point), some suggestions are a bit weird, though I discovered nice music among suggested tracks.
lucifer: another question, how does the duration of listens impact weights?
lucifer
zas: so far it doesn't affect the weight. only the number of times a recording is listened affects the weight currently.
zas
huh? so if I listen 20% of a song it thinks I somehow like it?
which is usually the reverse
lucifer
right. its a pending improvement to make.
one thing that still helps is that most users won't go back to a song they skipped say halfway so its listen count will be very low. whereas the ones they like they will have listened multiple times.
so the count acts as a proxy. but yes adding duration/support would help make it more explicit.
zas
the fact I don't listen a track twice doesn't mean I don't like it, but for sure the fact I don't listen a track in full usually means I'm not fond of it
(or I didn't had time to, but that's rare I guess)
lucifer
yes makes sense.
zas
Also, if I listen an album in full once (so 1 listen per track), usually means I like the music, so I guess listening multiple tracks of the same artist is a good indicator too, and even more if one or more albums were listened in full.
Another thing: if I like a band, I'm always curious about what band members do outside this band, I expect such suggestions: A & B are in C band, A is also in D band, and B has a solo project, I expect tracks from D and B to appear, even though I never listened a track from them. Is this taken in account somehow atm?
mayhem
moooin!
aerozol: I see you want to do the compositing with the cover art first and then the image on top? that might work better, indeed.
are the covers in that image you posted transparent?
we need to find a place to collect all of these images. can I fetch them out of the figema?
lucifer: I had a really rough time getting to sleep last night, because the similarity data made me realize a big big thing.
throwing all of the listens at the similarity alg simply overfits it and the result is.... noise.
we're getting quite good results with TWO hits; this suggests that the optimum window is some time larger than that.
lets call it 30 days.
we should never apply our alg to more data than this window. ever.
instead, what we should do is calculate many windows over time of this data. chunks made up of 90 days of data.
(combining 3 chunks at a time)
the key insight is that we will gain the best data when we analyze tracks in the time when a given track was released -- when people will have been listening to it with other tracks that were released about the same time.
e.g. a 2000s track will have the best play co-incidences when analyzed with listens from the same era.
I'll likely need to make some graphs to make this more clear.
lucifer
zas: currently not considered. what we intend to do is build various types of recommendation algorithms and collate their results. this artist correlation cannot be reliably inferred by the CF algorithm we use currently. the only correlations it can make is users who listened to artist A listen to artist B as well. however, we can build another algorithm which utilises these artist correlations to suggest tracks.
mayhem: i see. sounds good to do multiple runs over small chunks.
i think there is some value in doing a few larger chunks as well so that we get similarity of tracks which were released at different point in times as well but no reason that these multiple runs couldn't capture. we can always experiment and see how it goes.
mayhem
first, lets see about finding a well tuned window size. then we need to explore the temporal nature of the data.
larger chunks are not the answer, I think.
lucifer
to make sure iiuc we will store scores of various windows separately and not aggregate them?
mayhem
you always use the same sized chunk, but you calculate them starting from older starting point.
and then combine the data from chunks to make a real result.
lucifer
but that would act the same as the current alogrithm i think.
mayhem
if you used *all* chunks for searching, yes.
but the key is to select only a few chunks for searching.
lucifer
whether you calculate sum of Jan-Mar, Apr-Jul and then add those. or you calculate Jan-Jul at once it would be the same.
mayhem
we need to be mindful of both the indexing chunks and which index chunks are used in a search.
lucifer
hmm, i see.
mayhem
lucifer: correct.
lucifer
i am not sure i understand the plan fully currently but let's try and see. it'll probably become clearer in due time.
mayhem
like I said, I am not explaining this. my brain was racing until 4am when I worked this out and I'm now poorly slept as a result.
I'll draw a graph about this later today, that will make it more clear.
this is up on beta. save to LB then use export to spotify. (to see the button you'll have to change url to beta.lb manually and and also be logged in there. because open as playlist will go prod lb)
also window size, 30 and 90 also generated. algorithms available: `session_based_days_7_session_300`, `session_based_days_30_session_300` and `session_based_days_90_session_300`.
the overall lookup is now slower because the 90 days generated too many rows and there is no minimum threshold in place.
Maxr1998 joined the channel
Maxr1998_ has quit
CatQuest
happy Diwali
(:D)
aerozol
zas: lucifer: afaik skipping a song or video early on is one of the biggest indicators of 'didn't like' that TikTok etc uses to suggest stuff (and they are very good at creating personalized feeds...)
But it does seem like that would be quite a new piece of code under the LB hood?
mayhem: I was going to just do a transparent layer on top with shadows etc but then realized I could just do one image on top and nothing underneath. That one I posted on irc is good for you to use
mayhem
hiya!
aerozol
Everything I've done is also on the figma, have at it 👍
mayhem
I finally figured out later that this is what you had in mind. I'll play with that after I finish the board meeting prep
aerozol
Cool - could still do a jpg underneath if image size is an issue (got it to 300kb or so)
mayhem
should be fine.
aerozol
Happy Diwali all! (thus finishes my morning irc catch-up)
lucifer
aerozol: yes. agreed. LB indeed currently doesn't have a way to track skips. however this similar recordings thing we are currently working on can infer skips.
we havent reached that point yet though.
aerozol
Ooh that sounds really promising. Afaik the modern way to figure out likes is to not even have users like or dislike stuff, just to track where they 'pause' and watch something. Which skews it to clickbaitey stuff but it seems to glue people to their phones pretty good
lucifer
yeah. that probably works well for reels like stuff but also needs a lot of tracking afaiu.
spotify does this a lot. for instance it tracks why a track was paused, at what all points so on.
akshaaatt
Happy Diwali everyone!❤️❤️❤️❤️
lucifer
happy diwali! 🎉🎉
ansh
Happy Diwali!! 🎉 🎉
aerozol
🎉🎉🎉
mayhem
happy diwali!!
aerozol: I think like the previous LPs on the floor image better.
the new one is darker and has more of a margin that I think is not needed.
and every time we change anything that moves the coverart around the image, I have to painstakingly align the images again. took me about an hour to get it right the first time.
aerozol
mayhem: you ran with a quick screenshot/snip that I posted 😜
I'll tweak the pic to match the lineup in a little bit
mayhem
ahhh, ok. I can wait.
k, thanks!
aerozol
I'll lighten it up a bit too
mayhem
lucifer: at a first glance, those similar data sets look quite interesting.
but due to the size of the index (without the threshold) and that I have to make one request for every track (and I process 100 of them), it isn't feasible to work on this.
can you please add the threshold and also enable more than one MBID to be looked up with the similarity endpoint? then things will be snappy. thanks!
v6lur joined the channel
lucifer
mayhem: yes, makes sense. we'll have to delete the existing table to get rid of those extra entries. i'll add a configurable threshold parameter.
mayhem
perfect.
lucifer
one thing you could try is passing the count parameter in meantime. maybe that speeds up the lookup a bit
mayhem
we need to adjust the data set hoster to take a single arg (algorithm) and a list of args (MBIDs). thoughts on how to do that?
thanks, but its getting late here. I'll leave this be for today and pick it up again tomorrow.
lucifer
simplest way would be to pass in algorithm every time.
mayhem
but if aerozol comes up with something, I'll play with that. thats easier to understand when tired. :)
lucifer: yes, but that suggests that we would honor them being different MBID by MBID.
which is not necessary, I would say.
lucifer
yeah. another possible solution is pass the list normally. recording_mbids: list of mbids instead of doing recording_mbid: mbid for each item. the query can have extra logic to interpret the param as a list
mayhem
not quite sure I follow. perhaps better to discuss this tomorrow after more rest.
lucifer
we do `[{'x': 1}, {'x': 5}]` currently. instead we could do `[{'x': [1, 5]}]`.
but yes sounds good to discuss later.
mayhem
indeed. that is the easy part. how do you express that in HTTP parameters for the GET?
lucifer
easy way would be to disallow arrays in get and only allow it in post.
aerozol
What’s the listenbrainz url again for the grids/visualisations?
lucifer
other ways are to specify some custom delimeter like a , or ; for multiple params or some apis specify the same param multiple times. in any case i think it'll be inconsistent with the post way but fine imo.