in #metabrainz

2:38 AM
nbin has quit
2:38 AM
nbin_ joined the channel
2:56 AM
Shubh joined the channel
4:37 AM
gcrkrause3 has quit
4:41 AM
gcrkrause3 joined the channel
4:57 AM
MRiddickW has quit
5:58 AM
piwu85357 joined the channel
6:01 AM
gcrkrause joined the channel
6:02 AM
Clint_ joined the channel
6:07 AM
gcrkrause3 has quit
6:07 AM
ssam has quit
6:07 AM
Clint has quit
6:07 AM
Leo_Verto has quit
6:07 AM
piwu8535 has quit
6:07 AM
monotux has quit
6:07 AM
ijc has quit
6:07 AM
piwu85357 is now known as piwu8535
6:12 AM
Leo_Verto joined the channel
6:12 AM
monotux joined the channel
6:12 AM
ijc joined the channel
7:15 AM
reosarevok

alastairp: please archive https://github.com/metabrainz/docker-rsyncd :)
7:16 AM
lucifer: I was told something about how it's likely the user renamer will be ready by the end of the week :) Is that still possible?
7:16 AM
(for LB)
7:24 AM
lucifer

reosarevok: i think it'll need one more week :p, a couple of PRs pending and dumps need to updated. but closer than ever before!
7:25 AM
reosarevok

!recall oh no.
7:25 AM
BrainzBot

https://usercontent.irccloud-cdn.com/file/uwma2...
7:25 AM
reosarevok

Well, it'll take what it needs to take :)
8:31 AM
BrainzGit

[design-system] 14akshaaatt opened pull request #51 (03master…fix-storybook-workflow): Run workflow only for the master branch https://github.com/metabrainz/design-system/pul...
8:36 AM
[design-system] 14akshaaatt merged pull request #51 (03master…fix-storybook-workflow): Run workflow only for the master branch https://github.com/metabrainz/design-system/pul...
9:38 AM
ssam joined the channel
9:47 AM
[critiquebrainz] 14amCap1712 merged pull request #389 (03master…master): CB-421: Show user ratings on the profile page https://github.com/metabrainz/critiquebrainz/pu...
9:48 AM
[critiquebrainz] 14amCap1712 merged pull request #388 (03master…akshat/readme-updates): Enhance README.md https://github.com/metabrainz/critiquebrainz/pu...
9:55 AM
reosarevok

bitmap, yvanzo: https://github.com/metabrainz/musicbrainz-serve... would be good to fix before releasing beta into prod :) Whether this way or differently I don't mind
10:14 AM
mayhem

yvanzo: the weblate fellow has acked the deal and will submit invoice to me and come back to you to continue setup.
10:19 AM
lucifer: alastairp monkey : https://bono.metabrainz.org/recording-similarity updated with 5 years of data. it is producing results that are listenable, I think. I'm going to work up something to submit playlists soon.
10:20 AM
lucifer

sounds good, i tried a few recordings and the similar ones looked nice to me.
10:21 AM
mayhem

great.
10:21 AM
I have to say, I think I can see my own influences on this data.
10:22 AM
outsidecontext

mayhem: oh yes, I did test it yesterday for one track and basically got back my playlist of tracks :D Still this approach looks very promising
10:23 AM
lucifer

indeed, when i saw it yesterday https://bono.metabrainz.org/recording-similarit... and it looked like a list of tracks i had played. today with more data its a bit mixed though, probably a good sign?
10:23 AM
mayhem

outsidecontext: exactly that. but yesterday I ran only 3 years of data. today I manged 5. and working on getting it all in, but apparently 64GB ram is not enough for my python script.
10:24 AM
yes, the more years I include the better it gets.
10:24 AM
lucifer

we can try running it on the spark cluster if the entire data cannot fit in memory.
10:24 AM
mayhem

next: make playlists. after that: calculate canonical recordings: next: rec similarity on canonical recordings. I expect to be able to process many more years and to get much better results.
10:25 AM
lucifer: that is good point.
10:25 AM
I think it needs to live there, but there is a lot of logic in the python code.
10:25 AM
not sure how well this translates to the spark world -- the bulk of the processing is done in ram.
10:26 AM
lucifer

yeah will need to look into that. python code is going to be slower than builtin stuff/sql.
10:26 AM
mayhem

speed is the not the greatest worry. just getting the task done is.
10:27 AM
I suspect that we need to run this alg once a week when we're ready to deploy.
10:27 AM
lucifer

we should probably be fine then.
10:28 AM
mayhem

let me iron out the kinks of the core alg -- it can do that rather quickly with the current setup.
10:28 AM
once we're happy, lets port this to spark.
10:28 AM
lucifer

another thing is if you can modify this code to use pandas dataframes, we could directly run that on the spark cluster. that way you can test out stuff in memory on bono and deploy on spark.
10:29 AM
👍
10:29 AM
mayhem

ok, that sounds like something useful to learn to do.
10:47 AM
lucifer: want a test playlist made? gimme a recording_mbid if so
10:49 AM
lucifer

mayhem: sure, 9541592c-0102-4b94-93cc-ee0f3cf83d64
10:50 AM
monkey

The results of the recording similarity tool are interesting. Some look very good and makes for a good discovery tool. Other results are puzzling, and suggest that few user are listening to the seed track; at that point you're just getting someone else's listening history (which is still a good discovery tool, just not necessarily "similar")
10:50 AM
mayhem

lucifer: https://listenbrainz.org/playlist/792402ef-2673...
10:50 AM
monkey: want a playlist?
10:51 AM
monkey: agreed
10:51 AM
once I do this on canonical recordings, the picture ought to change a lot.
10:51 AM
I'll start work on that now.
10:51 AM
monkey

Sure, let me think of two good cases, one obscure and one more mainstream
10:52 AM
mayhem

yeah, some of the matches are a bit WTF. :)
10:53 AM
monkey

And concurring with what yourself and outsidecontext were saying, I also get some great results on obscure stuff I've listened to, which looked heavily influenced by my own listening history (which is good for the similarity part, not great for discovery)
10:53 AM
mayhem nods
10:53 AM
mayhem

the good thing about this alg is that this data gets better the more users we have.
10:54 AM
while the WTF tracks have been "this doesn't fit" they are not "this is horrible!".
10:56 AM
oh, and there are other improvements I have not made yet.
10:56 AM
namely, that in order to two tracks to be considered similar, they need to not have been played more than... 30 minutes apart, methinks.
10:57 AM
monkey

Hm. Good question.
10:59 AM
And I agree, even if the similarity is questionable you do probably end up with recordings that someone with similar taste to {original recording} listened to
11:00 AM
mayhem

yes, and remember that this is one element that goes into recommending a playlist that is similar to this track. we're listening to raw data results and not things groomed by troi.
11:00 AM
for raw data, this early in the game? eggcited!
11:02 AM
monkey

Indeed! Here's two recordings, I can haz playlists plz?
11:02 AM
1e22b22b-92fd-4879-b4d9-28ca0fc27f94 — e2c5d227-90aa-4cc4-b1cc-ba7d412f05fe
11:02 AM
mayhem

https://listenbrainz.org/playlist/480b23d9-729e...
11:03 AM
https://listenbrainz.org/playlist/44b7de88-ebec...
11:03 AM
I dont know the music of the second one, but it looks dodgy.
11:19 AM
reosarevok

mayhem: fa0de794-cde1-4565-83e7-42fc7e4dd999 ? Curious if it's all literally my own listening :p
11:19 AM
mayhem

seems so: https://listenbrainz.org/playlist/050ecb6d-ccdb...
11:21 AM
BrainzGit

[listenbrainz-server] 14alastair opened pull request #1832 (03master…spotify-read-metadata): Update listen additional_info metadata added to spotify listens https://github.com/metabrainz/listenbrainz-serv...
11:23 AM
reosarevok

Yeah, so for very obscure stuff (well, obscure MeB-wise) it won't work well, but it is probably fairly good for other stuff
11:24 AM
So we just need to get more Spanish people listening :p
11:24 AM
mayhem

yep
11:34 AM
lucifer: there is an ld process owned by you that has been spinning on bono for days.
11:34 AM
any idea what that might be?
11:35 AM
lucifer

oh no idea. let me look
11:36 AM
oh. its the remote development stuff i use sometimes to directly develop on bono from my IDE.
11:38 AM
i kill'ed a process so now those should be gone. i'll see if i can setup some auto shutdown after N time of inactivity
11:39 AM
mayhem

cool, thank!
11:39 AM
+s
11:55 AM
lucifer

alastairp: hi! i went through the feedback on listen user id, i applied the changes at places but at others those methods are changing in #1700 so I think we should leave them as is for now.
12:05 PM
BrainzGit

[design-system] 14akshaaatt opened pull request #52 (03master…add-components-1): Add components Phase One https://github.com/metabrainz/design-system/pul...
12:19 PM
alastairp

lucifer: I saw your comments, thanks. yes, I agree that it's not worth changing the other places that are updated in new counts pr
12:21 PM
reosarevok: archived
12:21 PM
reosarevok

Thanks
12:22 PM
alastairp

reosarevok: there's one more thing in lucifer's work on this renaming PR that needs to be done, so I'll rename this week's request manually after lunch. is there another one?
12:22 PM
reosarevok

Nope
12:22 PM
But let me know when you're ready and I can do the MB renaming around the same time then
12:22 PM
alastairp

ok. I'll ping you when I'm back from lunch
13:00 PM
reosarevok

alastairp: I'm going for a walk, I'll just do the change now and hopefully it won't be a problem to change the other in an hour or so :)
14:24 PM
yellowhatpro

Helloo guys!!
14:24 PM
In the musicbrainz app, while searching musicbrainz data , I am getting result not found at times.
14:24 PM
Looking at logs it seems we are making many api calls.
14:24 PM
We won't face this situation in the release build , right??
14:39 PM
lucifer

yellowhatpro: without looking at logs, cannot say what the issue is but i wouldn't expect any difference between debug and release builds at least not in connecting with MB api.
14:41 PM
yellowhatpro

Is it like when I am using debug version , my api calls are restricted to certain limit .
14:41 PM
But since musicbrainz app is associated to Metabrainz , the services then wont have such restrictions?
14:42 PM
in the release version
14:42 PM
lucifer

no, both the debug and release version are ratelimited.
14:47 PM
yellowhatpro

Can I send the logs?
14:47 PM
lucifer

sure
14:48 PM
yellowhatpro

https://pastebin.com/AjwjNnvJ
14:48 PM
I truncated a bit
14:48 PM
BrainzGit

[listenbrainz-server] 14amCap1712 opened pull request #1833 (03pg-listen-count…remove-user-name-usage): Do not read user_name from timescale https://github.com/metabrainz/listenbrainz-serv...
14:49 PM
yellowhatpro

I searched for once , and it showed exceeding requests (at line 313) , and also recycler view got only one response
14:50 PM
lucifer

oh that looks like an error in offset, count calculation.
14:50 PM
yellowhatpro

Is it something we can fix or is it a normal thing?
14:50 PM
lucifer

that's definitely a bug
14:50 PM
is it on the master branch or after your changes?
14:51 PM
yellowhatpro

lemme check from master branch wait sir
14:51 PM
lucifer

no need for sir :)
14:52 PM
yes if its on master branch then probably a bug in how paging library is setup.
14:53 PM
MRiddickW joined the channel
14:54 PM
yellowhatpro

Yup its in master
15:00 PM
alastairp

lucifer: nice, that didn't look like a large change
15:03 PM
Clint_ is now known as Clint
15:09 PM
mayhem

ok, canonical_recording table is being populated on gaga now. that was literally an hour of work, lol.
15:12 PM
lucifer

alastairp: the dump changes are pending, i'll push that too. but yes smaller change than i expected.
15:13 PM
yellowhatpro: i see, yes a bug in the app then. feel free to look into fixing it if you want. i'll try looking into it too later.
15:14 PM
yellowhatpro

I'll try fixing it yoshh
15:34 PM
mayhem

meh.
15:35 PM
lucifer: I did't quite make the connection, but given the way that I was calculating recording_similarity, it was already doing it on only the canonical_recordings. boo.
15:36 PM
oh well, at least the canonical recordings table allows me to ensure that all recording_mbids will come up with similar recordings.
15:36 PM
lucifer

oh. how so?
15:36 PM
cannoical recordings is a MB concept but the index is built from listens?
15:36 PM
mayhem

I was joining listens to the mapping table, which only contains canonical recordings.
15:36 PM
lucifer

ah!
15:36 PM
mayhem

https://bono.metabrainz.org/recording-similarit...
15:36 PM
gives no results.
15:37 PM
https://www.irccloud.com/pastebin/m9hmtg2r/
15:37 PM
https://bono.metabrainz.org/recording-similarit...
15:37 PM
gives results.
15:38 PM
which then conclusively means that this cannot be done with python/PG. well, it can, but it would be a lot more work and it would just make sense to move this to spark.
15:38 PM
lucifer

i see makes sense.
15:39 PM
the listens spark has already are similar to the sql query your were using so just need to add the algo part on spark.
15:41 PM
mayhem

and the algo part is what I am most unclear about.