#metabrainz

/

      • nbin has quit
      • nbin_ joined the channel
      • Shubh joined the channel
      • gcrkrause3 has quit
      • gcrkrause3 joined the channel
      • MRiddickW has quit
      • piwu85357 joined the channel
      • gcrkrause joined the channel
      • Clint_ joined the channel
      • gcrkrause3 has quit
      • ssam has quit
      • Clint has quit
      • Leo_Verto has quit
      • piwu8535 has quit
      • monotux has quit
      • ijc has quit
      • piwu85357 is now known as piwu8535
      • Leo_Verto joined the channel
      • monotux joined the channel
      • ijc joined the channel
      • reosarevok
        alastairp: please archive https://github.com/metabrainz/docker-rsyncd :)
      • lucifer: I was told something about how it's likely the user renamer will be ready by the end of the week :) Is that still possible?
      • (for LB)
      • lucifer
        reosarevok: i think it'll need one more week :p, a couple of PRs pending and dumps need to updated. but closer than ever before!
      • reosarevok
        !recall oh no.
      • BrainzBot
      • reosarevok
        Well, it'll take what it needs to take :)
      • BrainzGit
        [design-system] 14akshaaatt opened pull request #51 (03master…fix-storybook-workflow): Run workflow only for the master branch https://github.com/metabrainz/design-system/pul...
      • [design-system] 14akshaaatt merged pull request #51 (03master…fix-storybook-workflow): Run workflow only for the master branch https://github.com/metabrainz/design-system/pul...
      • ssam joined the channel
      • [critiquebrainz] 14amCap1712 merged pull request #389 (03master…master): CB-421: Show user ratings on the profile page https://github.com/metabrainz/critiquebrainz/pu...
      • [critiquebrainz] 14amCap1712 merged pull request #388 (03master…akshat/readme-updates): Enhance README.md https://github.com/metabrainz/critiquebrainz/pu...
      • reosarevok
        bitmap, yvanzo: https://github.com/metabrainz/musicbrainz-serve... would be good to fix before releasing beta into prod :) Whether this way or differently I don't mind
      • mayhem
        yvanzo: the weblate fellow has acked the deal and will submit invoice to me and come back to you to continue setup.
      • lucifer: alastairp monkey : https://bono.metabrainz.org/recording-similarity updated with 5 years of data. it is producing results that are listenable, I think. I'm going to work up something to submit playlists soon.
      • lucifer
        sounds good, i tried a few recordings and the similar ones looked nice to me.
      • mayhem
        great.
      • I have to say, I think I can see my own influences on this data.
      • outsidecontext
        mayhem: oh yes, I did test it yesterday for one track and basically got back my playlist of tracks :D Still this approach looks very promising
      • lucifer
        indeed, when i saw it yesterday https://bono.metabrainz.org/recording-similarit... and it looked like a list of tracks i had played. today with more data its a bit mixed though, probably a good sign?
      • mayhem
        outsidecontext: exactly that. but yesterday I ran only 3 years of data. today I manged 5. and working on getting it all in, but apparently 64GB ram is not enough for my python script.
      • yes, the more years I include the better it gets.
      • lucifer
        we can try running it on the spark cluster if the entire data cannot fit in memory.
      • mayhem
        next: make playlists. after that: calculate canonical recordings: next: rec similarity on canonical recordings. I expect to be able to process many more years and to get much better results.
      • lucifer: that is good point.
      • I think it needs to live there, but there is a lot of logic in the python code.
      • not sure how well this translates to the spark world -- the bulk of the processing is done in ram.
      • lucifer
        yeah will need to look into that. python code is going to be slower than builtin stuff/sql.
      • mayhem
        speed is the not the greatest worry. just getting the task done is.
      • I suspect that we need to run this alg once a week when we're ready to deploy.
      • lucifer
        we should probably be fine then.
      • mayhem
        let me iron out the kinks of the core alg -- it can do that rather quickly with the current setup.
      • once we're happy, lets port this to spark.
      • lucifer
        another thing is if you can modify this code to use pandas dataframes, we could directly run that on the spark cluster. that way you can test out stuff in memory on bono and deploy on spark.
      • 👍
      • mayhem
        ok, that sounds like something useful to learn to do.
      • lucifer: want a test playlist made? gimme a recording_mbid if so
      • lucifer
        mayhem: sure, 9541592c-0102-4b94-93cc-ee0f3cf83d64
      • monkey
        The results of the recording similarity tool are interesting. Some look very good and makes for a good discovery tool. Other results are puzzling, and suggest that few user are listening to the seed track; at that point you're just getting someone else's listening history (which is still a good discovery tool, just not necessarily "similar")
      • mayhem
      • monkey: want a playlist?
      • monkey: agreed
      • once I do this on canonical recordings, the picture ought to change a lot.
      • I'll start work on that now.
      • monkey
        Sure, let me think of two good cases, one obscure and one more mainstream
      • mayhem
        yeah, some of the matches are a bit WTF. :)
      • monkey
        And concurring with what yourself and outsidecontext were saying, I also get some great results on obscure stuff I've listened to, which looked heavily influenced by my own listening history (which is good for the similarity part, not great for discovery)
      • mayhem nods
      • mayhem
        the good thing about this alg is that this data gets better the more users we have.
      • while the WTF tracks have been "this doesn't fit" they are not "this is horrible!".
      • oh, and there are other improvements I have not made yet.
      • namely, that in order to two tracks to be considered similar, they need to not have been played more than... 30 minutes apart, methinks.
      • monkey
        Hm. Good question.
      • And I agree, even if the similarity is questionable you do probably end up with recordings that someone with similar taste to {original recording} listened to
      • mayhem
        yes, and remember that this is one element that goes into recommending a playlist that is similar to this track. we're listening to raw data results and not things groomed by troi.
      • for raw data, this early in the game? eggcited!
      • monkey
        Indeed! Here's two recordings, I can haz playlists plz?
      • 1e22b22b-92fd-4879-b4d9-28ca0fc27f94 — e2c5d227-90aa-4cc4-b1cc-ba7d412f05fe
      • mayhem
      • I dont know the music of the second one, but it looks dodgy.
      • reosarevok
        mayhem: fa0de794-cde1-4565-83e7-42fc7e4dd999 ? Curious if it's all literally my own listening :p
      • mayhem
      • BrainzGit
        [listenbrainz-server] 14alastair opened pull request #1832 (03master…spotify-read-metadata): Update listen additional_info metadata added to spotify listens https://github.com/metabrainz/listenbrainz-serv...
      • reosarevok
        Yeah, so for very obscure stuff (well, obscure MeB-wise) it won't work well, but it is probably fairly good for other stuff
      • So we just need to get more Spanish people listening :p
      • mayhem
        yep
      • lucifer: there is an ld process owned by you that has been spinning on bono for days.
      • any idea what that might be?
      • lucifer
        oh no idea. let me look
      • oh. its the remote development stuff i use sometimes to directly develop on bono from my IDE.
      • i kill'ed a process so now those should be gone. i'll see if i can setup some auto shutdown after N time of inactivity
      • mayhem
        cool, thank!
      • +s
      • lucifer
        alastairp: hi! i went through the feedback on listen user id, i applied the changes at places but at others those methods are changing in #1700 so I think we should leave them as is for now.
      • BrainzGit
        [design-system] 14akshaaatt opened pull request #52 (03master…add-components-1): Add components Phase One https://github.com/metabrainz/design-system/pul...
      • alastairp
        lucifer: I saw your comments, thanks. yes, I agree that it's not worth changing the other places that are updated in new counts pr
      • reosarevok: archived
      • reosarevok
        Thanks
      • alastairp
        reosarevok: there's one more thing in lucifer's work on this renaming PR that needs to be done, so I'll rename this week's request manually after lunch. is there another one?
      • reosarevok
        Nope
      • But let me know when you're ready and I can do the MB renaming around the same time then
      • alastairp
        ok. I'll ping you when I'm back from lunch
      • reosarevok
        alastairp: I'm going for a walk, I'll just do the change now and hopefully it won't be a problem to change the other in an hour or so :)
      • yellowhatpro
        Helloo guys!!
      • In the musicbrainz app, while searching musicbrainz data , I am getting result not found at times.
      • Looking at logs it seems we are making many api calls.
      • We won't face this situation in the release build , right??
      • lucifer
        yellowhatpro: without looking at logs, cannot say what the issue is but i wouldn't expect any difference between debug and release builds at least not in connecting with MB api.
      • yellowhatpro
        Is it like when I am using debug version , my api calls are restricted to certain limit .
      • But since musicbrainz app is associated to Metabrainz , the services then wont have such restrictions?
      • in the release version
      • lucifer
        no, both the debug and release version are ratelimited.
      • yellowhatpro
        Can I send the logs?
      • lucifer
        sure
      • yellowhatpro
      • I truncated a bit
      • BrainzGit
        [listenbrainz-server] 14amCap1712 opened pull request #1833 (03pg-listen-count…remove-user-name-usage): Do not read user_name from timescale https://github.com/metabrainz/listenbrainz-serv...
      • yellowhatpro
        I searched for once , and it showed exceeding requests (at line 313) , and also recycler view got only one response
      • lucifer
        oh that looks like an error in offset, count calculation.
      • yellowhatpro
        Is it something we can fix or is it a normal thing?
      • lucifer
        that's definitely a bug
      • is it on the master branch or after your changes?
      • yellowhatpro
        lemme check from master branch wait sir
      • lucifer
        no need for sir :)
      • yes if its on master branch then probably a bug in how paging library is setup.
      • MRiddickW joined the channel
      • yellowhatpro
        Yup its in master
      • alastairp
        lucifer: nice, that didn't look like a large change
      • Clint_ is now known as Clint
      • mayhem
        ok, canonical_recording table is being populated on gaga now. that was literally an hour of work, lol.
      • lucifer
        alastairp: the dump changes are pending, i'll push that too. but yes smaller change than i expected.
      • yellowhatpro: i see, yes a bug in the app then. feel free to look into fixing it if you want. i'll try looking into it too later.
      • yellowhatpro
        I'll try fixing it yoshh
      • mayhem
        meh.
      • lucifer: I did't quite make the connection, but given the way that I was calculating recording_similarity, it was already doing it on only the canonical_recordings. boo.
      • oh well, at least the canonical recordings table allows me to ensure that all recording_mbids will come up with similar recordings.
      • lucifer
        oh. how so?
      • cannoical recordings is a MB concept but the index is built from listens?
      • mayhem
        I was joining listens to the mapping table, which only contains canonical recordings.
      • lucifer
        ah!
      • mayhem
      • gives no results.
      • gives results.
      • which then conclusively means that this cannot be done with python/PG. well, it can, but it would be a lot more work and it would just make sense to move this to spark.
      • lucifer
        i see makes sense.
      • the listens spark has already are similar to the sql query your were using so just need to add the algo part on spark.
      • mayhem
        and the algo part is what I am most unclear about.