yvanzo: the weblate fellow has acked the deal and will submit invoice to me and come back to you to continue setup.
lucifer: alastairp monkey : https://bono.metabrainz.org/recording-similarity updated with 5 years of data. it is producing results that are listenable, I think. I'm going to work up something to submit playlists soon.
lucifer
sounds good, i tried a few recordings and the similar ones looked nice to me.
mayhem
great.
I have to say, I think I can see my own influences on this data.
outsidecontext
mayhem: oh yes, I did test it yesterday for one track and basically got back my playlist of tracks :D Still this approach looks very promising
outsidecontext: exactly that. but yesterday I ran only 3 years of data. today I manged 5. and working on getting it all in, but apparently 64GB ram is not enough for my python script.
yes, the more years I include the better it gets.
lucifer
we can try running it on the spark cluster if the entire data cannot fit in memory.
mayhem
next: make playlists. after that: calculate canonical recordings: next: rec similarity on canonical recordings. I expect to be able to process many more years and to get much better results.
lucifer: that is good point.
I think it needs to live there, but there is a lot of logic in the python code.
not sure how well this translates to the spark world -- the bulk of the processing is done in ram.
lucifer
yeah will need to look into that. python code is going to be slower than builtin stuff/sql.
mayhem
speed is the not the greatest worry. just getting the task done is.
I suspect that we need to run this alg once a week when we're ready to deploy.
lucifer
we should probably be fine then.
mayhem
let me iron out the kinks of the core alg -- it can do that rather quickly with the current setup.
once we're happy, lets port this to spark.
lucifer
another thing is if you can modify this code to use pandas dataframes, we could directly run that on the spark cluster. that way you can test out stuff in memory on bono and deploy on spark.
👍
mayhem
ok, that sounds like something useful to learn to do.
lucifer: want a test playlist made? gimme a recording_mbid if so
The results of the recording similarity tool are interesting. Some look very good and makes for a good discovery tool. Other results are puzzling, and suggest that few user are listening to the seed track; at that point you're just getting someone else's listening history (which is still a good discovery tool, just not necessarily "similar")
once I do this on canonical recordings, the picture ought to change a lot.
I'll start work on that now.
monkey
Sure, let me think of two good cases, one obscure and one more mainstream
mayhem
yeah, some of the matches are a bit WTF. :)
monkey
And concurring with what yourself and outsidecontext were saying, I also get some great results on obscure stuff I've listened to, which looked heavily influenced by my own listening history (which is good for the similarity part, not great for discovery)
mayhem nods
mayhem
the good thing about this alg is that this data gets better the more users we have.
while the WTF tracks have been "this doesn't fit" they are not "this is horrible!".
oh, and there are other improvements I have not made yet.
namely, that in order to two tracks to be considered similar, they need to not have been played more than... 30 minutes apart, methinks.
monkey
Hm. Good question.
And I agree, even if the similarity is questionable you do probably end up with recordings that someone with similar taste to {original recording} listened to
mayhem
yes, and remember that this is one element that goes into recommending a playlist that is similar to this track. we're listening to raw data results and not things groomed by troi.
for raw data, this early in the game? eggcited!
monkey
Indeed! Here's two recordings, I can haz playlists plz?
[listenbrainz-server] 14alastair opened pull request #1832 (03master…spotify-read-metadata): Update listen additional_info metadata added to spotify listens https://github.com/metabrainz/listenbrainz-serv...
reosarevok
Yeah, so for very obscure stuff (well, obscure MeB-wise) it won't work well, but it is probably fairly good for other stuff
So we just need to get more Spanish people listening :p
mayhem
yep
lucifer: there is an ld process owned by you that has been spinning on bono for days.
any idea what that might be?
lucifer
oh no idea. let me look
oh. its the remote development stuff i use sometimes to directly develop on bono from my IDE.
i kill'ed a process so now those should be gone. i'll see if i can setup some auto shutdown after N time of inactivity
mayhem
cool, thank!
+s
lucifer
alastairp: hi! i went through the feedback on listen user id, i applied the changes at places but at others those methods are changing in #1700 so I think we should leave them as is for now.
lucifer: I saw your comments, thanks. yes, I agree that it's not worth changing the other places that are updated in new counts pr
reosarevok: archived
reosarevok
Thanks
alastairp
reosarevok: there's one more thing in lucifer's work on this renaming PR that needs to be done, so I'll rename this week's request manually after lunch. is there another one?
reosarevok
Nope
But let me know when you're ready and I can do the MB renaming around the same time then
alastairp
ok. I'll ping you when I'm back from lunch
reosarevok
alastairp: I'm going for a walk, I'll just do the change now and hopefully it won't be a problem to change the other in an hour or so :)
yellowhatpro
Helloo guys!!
In the musicbrainz app, while searching musicbrainz data , I am getting result not found at times.
Looking at logs it seems we are making many api calls.
We won't face this situation in the release build , right??
lucifer
yellowhatpro: without looking at logs, cannot say what the issue is but i wouldn't expect any difference between debug and release builds at least not in connecting with MB api.
yellowhatpro
Is it like when I am using debug version , my api calls are restricted to certain limit .
But since musicbrainz app is associated to Metabrainz , the services then wont have such restrictions?
in the release version
lucifer
no, both the debug and release version are ratelimited.
I searched for once , and it showed exceeding requests (at line 313) , and also recycler view got only one response
lucifer
oh that looks like an error in offset, count calculation.
yellowhatpro
Is it something we can fix or is it a normal thing?
lucifer
that's definitely a bug
is it on the master branch or after your changes?
yellowhatpro
lemme check from master branch wait sir
lucifer
no need for sir :)
yes if its on master branch then probably a bug in how paging library is setup.
MRiddickW joined the channel
yellowhatpro
Yup its in master
alastairp
lucifer: nice, that didn't look like a large change
Clint_ is now known as Clint
mayhem
ok, canonical_recording table is being populated on gaga now. that was literally an hour of work, lol.
lucifer
alastairp: the dump changes are pending, i'll push that too. but yes smaller change than i expected.
yellowhatpro: i see, yes a bug in the app then. feel free to look into fixing it if you want. i'll try looking into it too later.
yellowhatpro
I'll try fixing it yoshh
mayhem
meh.
lucifer: I did't quite make the connection, but given the way that I was calculating recording_similarity, it was already doing it on only the canonical_recordings. boo.
oh well, at least the canonical recordings table allows me to ensure that all recording_mbids will come up with similar recordings.
lucifer
oh. how so?
cannoical recordings is a MB concept but the index is built from listens?
mayhem
I was joining listens to the mapping table, which only contains canonical recordings.
which then conclusively means that this cannot be done with python/PG. well, it can, but it would be a lot more work and it would just make sense to move this to spark.
lucifer
i see makes sense.
the listens spark has already are similar to the sql query your were using so just need to add the algo part on spark.
mayhem
and the algo part is what I am most unclear about.