aerozol: huh, I hadn't heard anyone mention a scaling issue with the artist dialog yet, but I can see what you mean from your screenshot. you are on windows right? I can boot into windows and try to reproduce it later
2022-12-10 34450, 2022
bitmap
also let me know what browser you're using
2022-12-10 34453, 2022
aerozol
Bitmap: yup windows + chrome, I'm not home at the moment but I can send versions later if needed
2022-12-10 34450, 2022
vibhoo_24 joined the channel
2022-12-10 34433, 2022
vibhoo_24 has quit
2022-12-10 34443, 2022
vibhoo_24 joined the channel
2022-12-10 34446, 2022
vibhoo_24 has quit
2022-12-10 34430, 2022
vibhoo_24 joined the channel
2022-12-10 34432, 2022
vibhoo_24 has quit
2022-12-10 34434, 2022
kidd_73 joined the channel
2022-12-10 34403, 2022
kidd_73 has quit
2022-12-10 34442, 2022
aerozol
mayhem: the question is for a manual scrobbler - if someone is submitting a vinyl from MB, that doesn't have times on it, how should the plugin calculate timestamps
2022-12-10 34407, 2022
aerozol
I guess we could just put a generic time like assume the song is 1:50 or something
2022-12-10 34457, 2022
aerozol
lucifer: in these cases they usually wouldn't have the files I think, they'd be loading a release from the db into Picard. But they may have files
2022-12-10 34458, 2022
jivte joined the channel
2022-12-10 34451, 2022
jivte has quit
2022-12-10 34409, 2022
jivte joined the channel
2022-12-10 34449, 2022
vibhoo_24 joined the channel
2022-12-10 34452, 2022
jivte has quit
2022-12-10 34416, 2022
jivte joined the channel
2022-12-10 34448, 2022
vibhoo_24 has quit
2022-12-10 34458, 2022
vibhoo_24 joined the channel
2022-12-10 34439, 2022
jivte has quit
2022-12-10 34411, 2022
vibhoo_24 has quit
2022-12-10 34415, 2022
lucifer
mayhem: i checked your recent listens and didn't find Running up the Hill in last 10 days.
2022-12-10 34440, 2022
lucifer
maybe an issue in submitting listens or something related?
schickling[m]: we have a tool called `mbid_mapping_writer` that assigns matches listens to recordings in MB. it has 2 ways to to this, an exact match of the aritst and track name. the second option is a matrix of fuzzy searches, detuned comes into play here detuned means that we modified the original track and artist name submitted by the user in the attempt to find a match.
2022-12-10 34407, 2022
vibhoo_24 joined the channel
2022-12-10 34417, 2022
vibhoo_24
lucifer: I have made a new folder with the name utils inside listenbrainz-server/listenbrainz_spark/hdfs and inside that made a file __init__.py and moved all the functions which were related to hdfs from that file to this file.Please correct me if I am wrong.
2022-12-10 34435, 2022
lucifer
vibhoo_24: you can open a PR with the changes. i'll try to review it soon. if something needs to be changed will let you know on the PR>
2022-12-10 34408, 2022
mayhem
lucifer: it seems that me listening to my daily jams is totally absent from my listens. WTF?
2022-12-10 34436, 2022
lucifer
mayhem: uhh. weird... are other spotify listens present there?
2022-12-10 34438, 2022
mayhem
Yes, my album listens for when I am at my computer. But me listening on mobile seems to send listens.
2022-12-10 34427, 2022
lucifer
huh. can you try playing daily jams now? i'll query spotify api to check if the listens start to show up there or not.
2022-12-10 34439, 2022
mayhem
Playing now. Private listening is off, I checked.
2022-12-10 34433, 2022
lucifer
track playing is Röyksopp Forever?
2022-12-10 34441, 2022
schickling[m]
<lucifer> "schickling: we have a tool..." <- Got it. Thanks a lot for your explanation. Is the source code for this available somewhere? Curious to learn more!
2022-12-10 34409, 2022
Sophist-UK has quit
2022-12-10 34417, 2022
lucifer
schickling[m]: yes, but its spread in a lot of places. if you check back in some days, we have an open PR to document it.
the lookups are done against a typesense index. the index itself is keyed by `artist_name + track_name` of recordings.
2022-12-10 34459, 2022
schickling[m]
lucifer: Awesome! Looking forward to that!
2022-12-10 34459, 2022
schickling[m]
I assume for the string comparison you use some kind of "distance" calculation? Curious which approaches you leverage for that.
2022-12-10 34423, 2022
lucifer
yes levensthein distance to evaluate the hits returned by the index lookup
2022-12-10 34444, 2022
lucifer
we lookup in the typesense search index, the hits returned by the search index are then evaluated with the original search term based on levensthein distance.
2022-12-10 34409, 2022
schickling[m]
Got it! Thanks a lot for explaining. Will try to learn more about it :)
2022-12-10 34409, 2022
lucifer
based on the distance we assign the match a `quality`, high, medium, low.
2022-12-10 34449, 2022
lucifer
cool, feel free to ping again if you want to ask anything else.
2022-12-10 34415, 2022
lucifer
mayhem: 2 new listens just showed up for you.
2022-12-10 34433, 2022
schickling[m]
Thanks a lot lucifer! Appreciate it!
2022-12-10 34433, 2022
schickling[m]
Together with a friend we've been exploring track matching approach as well. In case we have any new learnings, I'll share them if you're interested :)
2022-12-10 34453, 2022
lucifer
schickling[m]: mayhem designed the current system we use in LB. we also use a few other tricks for matching. you probably want to discuss your approach with him for insights.
2022-12-10 34430, 2022
lucifer
as you may know, MB has multiple versions of the same recording/release. multiple release events of a recording being one factor in it. in that case you need a tie breaker to ensure that a given name always matches to a given recording for consistency.
2022-12-10 34446, 2022
Rishabh has quit
2022-12-10 34409, 2022
lucifer
for that purpose we have the concept of canonical recordings/releases in LB.
2022-12-10 34415, 2022
schickling[m]
lucifer: Very interesting. How do you "pick" the canonical recording/release?
2022-12-10 34429, 2022
schickling[m] uploaded an image: (58KiB) < https://libera.ems.host/_matrix/media/v3/download/matrix.org/GqGIdmXwvDoJOkhQshACEFbn/CleanShot%202022-12-10%20at%2015.01.05%402x.png >
2022-12-10 34446, 2022
schickling[m]
btw there's a funny bug (?) where the scale goes beyond 10/10
for similarity matching we only use latest year iirc so i would probably look at the this_year or last_year stats.
2022-12-10 34417, 2022
lucifer
mayhem: the listens now do appear but running up that hill will still be in daily jams :(. because the version played on spotify is different from the one recommended by CF so filtering would not match.
2022-12-10 34452, 2022
lucifer
i think we need to work on the idea to build another abstraction over canonical recordings for recommendation use cases.
2022-12-10 34442, 2022
lucifer
fwiw, CF recommends `Running Up That Hill (A Deal With God)` whereas spotify plays `Running Up That Hill (A Deal With God) - 2018 Remaster`