#metabrainz

/

      • texke has quit
      • texke joined the channel
      • minimal has quit
      • aerozol[m]
        reosarevok: yvanzo: I don’t think ‘usually vertical’ is that helpful in identifying something that doesn’t have to be vertical, even if it is the most common. I understand that it might help differentiate it from banners (which really are all horizontal, I think?), but unfortunately a lot of posters can be landscape or square
      • If we need the description to be longer I would just take one of the dictionary definitions, since there’s people at Miriam-Webster or whatever being paid to argue about these things (I assume)
      • Something like “Usually a large printed or digital sheet, that often contains pictures, and is posted publically.” (slightly modified Miriam-Webster definition)
      • atj: and/or yvanzo : woohoo, Solr upgrade!! Just double checking that I am okay to announce “we have upgraded our Solr cluster to 9.6.1” (or something along those lines) on our socials?
      • Time to show off your hard work :D
      • From what I gather from the message history it is no longer just on beta?
      • pite has quit
      • Kladky joined the channel
      • serene-arc[m] joined the channel
      • serene-arc[m]
        Hi all! I'm an app developer, writing a tool to upload playlists to ListenBrainz. I'm running into a little problem with the API and how to resolve songs to MBIDs. Would I be able to pick anyone's brain about that?
      • lucifer
        serene-arc[m]: sure what's the issue?
      • d4rk has quit
      • d4rk joined the channel
      • rimskii[m]: i have created the necessary tables on wolf. try again now.
      • serene-arc[m]
        So my project is the one linked below for reference. It searches the file tags to get the metadata to send to the ListenBrainz API. However, for whatever reason, it isn't the best at matching songs that don't have regular artist fields, so up to 15%-ish of songs aren't matched.
      • I notice that other tools such as mpdscribble don't seem to have this problem, so I was wondering if there's a way I'm using the API wrongly or something. Any help would be great!
      • aerozol[m]
        Jade: I meant to fully sign-off a couple of email templates today, but ran out of time. I will try get round to it this weekend! The design won’t change I don’t think, I just want to adjust the text and maybe the links, and get some more to the community for feedback. So hopefully you won’t have to redo anything
      • serene-arc[m]
        * lucifer (IRC): So my
      • aerozol[m]
        Jade: If the screenshot you showed bitmap is of a email you’ve devved, that looks awesome!! No problem re. making the font size a bit larger. FYI the emails I get the most are subscription emails and they can contain a *lot* of items. So it can be nice to have some oversight/not make them too big. But I imagine this will be easy to tweak later
      • lucifer
        serene-arc[m]: i am not sure what mpdscribble does but we have another matching API in ListenBrainz itself that you can try. https://listenbrainz.readthedocs.io/en/latest/u...
      • serene-arc[m]
        lucifer (IRC): we currently use that one! Unfortunately, it doesn't always work, at least as expected. Below are a couple of the curl commands that shows the problem. I'm not entirely sure if it's several problems that are having the same effect, or the same one with different cases.... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
      • lucifer
        serene-arc[m]: i see, unfortunately the current mapping system doesn't take alias-es into account.
      • the selena gomez x marshmello one resolves if the x is removed, that should be simple to fix i think just missing x as a possible join phrase in mapping.
      • serene-arc[m]
        lucifer (IRC): that's unfortunate that it doesn't take it into account. is there any way for me to use the resolution system that the last.fm proxy uses, or is that not publicly accessible?
      • lucifer
        serene-arc[m]: do you mean the LFM compatible API that ListenBrainz supports?
      • if so, it goes through the same system as the API endpoint you are using.
      • i wonder if mpdscribble depends on people tagging their collections with Picard first?
      • serene-arc[m]
        I'm not sure honestly, because it does manage to resolve the Selena Gomez song above when I play it. beets, of which I'm a maintainer, does the same thing as Picard if you haven't heard of it. My backup solution was to try and pull the MBID directly from the file, at the cost of giving up on these files for those that don't use beets/picard
      • lucifer
        sorry i am a bit confused. to be clear, Picard uses a different way to match stuff than the LB endpoint i mentioned.
      • when does the selena gomez song resolve and when does it not?
      • outsidecontext[m
        serene-arc: does the listen get submitted to LB witthout MBIDs and LB does the resolving server side or do you try to resolve MBID first locally and then submit with the found MBID?
      • rimskii[m]
        <lucifer> "rimskii: i have created the..." <- just tested it, it works !
      • thank you :)
      • serene-arc[m]
        lucifer (IRC): so my library is already organised with picard, and all of the metadata is consistent with that on MusicBrainz. My program to upload the playlists to ListenBrainz takes the file and reads the artist and title fields of the songs, and tries to find a match with the LB API. The correct MBID for that song is ff67fcb7-365a-4164-87e9-ef7768767528, but ListenBrainz fails to get that, even though the data is the same,
      • nominally.
      • outsidecontext to make a playlist, the MBID must be included.
      • My song in my library is tagged as this.
      • but the results don't give that when searched with the same data through the LB API
      • lucifer
        i see.
      • so there are multiple issues here, first not getting a match and second is getting a different match.
      • the second one is intentional, we have a concept of canonical data - a recording can be present of multiple releases we have a list of custom sorts to choose the "appropriate" one from MB.
      • we are working on improving this by letting users specify a release name as well during the search/mapping.
      • serene-arc[m]
        So far I haven't been worried about getting alternate matches for the song. It's mostly about getting errors when no songs are returned
      • outsidecontext[m
        serene-arc: sorry, I missed that it is about playlists. I thought it is about submission because of the mpdscribble comparison
      • lucifer
        yes makes sense, that is something we need to fix.
      • outsidecontext[m
        The majority of LB submission clients will first try to get the MBID from the file and use this for submission (if they support MBIDs in the first place), and if there is none leave the resolving to LB. If you need to resolve the MBIDs locally I'd still suggest to use the already found MBID first and do the lookup only if it is missing. That avoids false matches for MB tagged files
      • lucifer
        can you give me a list of all the songs that you tried?
      • so far i see two issues - one because we don't check for artist alias and second because x is not treated as punctuation/join phrase.
      • i want to see if there are any low hanging fruits or obvious bugs that can be fixed soon-ish.
      • serene-arc[m]
        I can give several!... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
      • I'm not sure what the last one is. That's from the Cyberpunk soundtrack and some of the songs resolve, some don't, and all have that odd alias system
      • I'll switch to using the local MBID if it exists though, thank you.
      • atj[m]
      • yvanzo: rrsync configuration has been deployed, please test if you can: https://github.com/metabrainz/metabrainz-ansibl...
      • (rrsync is installed with rsync on Debian & Ubuntu) so no deployment needed, just the SSH key
      • yvanzo[m]
        Hi aerozol: Yes, it is already in use by the main servers, and we keep doing adjustments, but it won’t be immediately available to mirrors .
      • aerozol: About posters, yes we need the descriptions to be longer, I would say to descriptive enough, that is to make sure that even newcomers (ignoring the context) are getting the whole (more or less broad) meaning we are trying to capture through this type. From that perspective, your proposal does the job.
      • aerozol: Also, yes I did look into dictionaries at first and took inspiration from these but last week discussion about this term has had no echo.
      • Hi atj, how can we keep `files/var/lib/solr/solr.xml` from the Ansible role in sync with `solr.xml` from `mbsssss`?
      • aerozol: When you have time please also look at the continuation of the propositions (for just two other terms) 😄 https://chatlogs.metabrainz.org/libera/musicbra...
      • kepstin has quit
      • kepstin joined the channel
      • BrainzGit
        [listenbrainz-server] 14anshg1214 merged pull request #2912 (03master…manually-submit-album): Submit album: Allow searching by LB album URL https://github.com/metabrainz/listenbrainz-serv...
      • akshaaatt[m] has quit
      • reosarevok[m]
        aerozol, yvanzo : ok, I did "Usually a large printed or digital sheet, that often contains pictures, and is generally posted publicly to promote the event." combining what we had and what you suggested
      • lucifer
        mayhem: hi! can you push your nmslib search prototypes to github and share the link?
      • yvanzo[m]
        reosarevok: Thank you, that works too! 📯
      • BrainzGit
        [listenbrainz-server] 14anshg1214 merged pull request #2914 (03master…submit-multiple-listens): LB-1448: Manually submit a queue listens https://github.com/metabrainz/listenbrainz-serv...
      • pranav[m] has quit
      • rimskii[m]
        <lucifer> "rimskii: i have created the..." <- lucifer: do tables contain data for spotify id tracks?
      • trying to test here, but it doesnt give any data for it
      • rimskii[m] sent a code block: https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/pyoGVESZZnJMbDeGMAGoOart
      • lucifer
        rimskii[m]: yes but only a limited number of artists/tracks/albums
      • rimskii[m]
        okay
      • mayhem[m]
        <lucifer> "mayhem: hi! can you push your..." <- will do in a bit. I should really add some comments because the code does some weird shit right now.
      • Sophist-UK joined the channel
      • lucifer
        rimskii[m]: in the terminal on wolf, run `psql URL_TO_MB_DATABASE_HERE` and then you can query something like `select * from spotify_cache.artist`, `spotify_cache.track` etc.
      • to see what data is there. i copied ~25000 tracks for ~100 popular artists.
      • mayhem[m]: 👍
      • rimskii[m]
        okay, thanks !!
      • mayhem[m]
        lucifer (IRC): did you deploy rimskii 's work to prod yesterday? is that the first gsoc work to go into prod this year?
      • lucifer
        mayhem[m]: not yet deployed, a LB PR that depends on it also needs to be merged first. hopefully today.
      • rimskii[m]
        lucifer: yay
      • mayhem[m]
        impressive.
      • mayhem[m] checks clock. june. well done rimskii!
      • ansh[m] has quit
      • ahvalmissaamine
        !recall applause!
      • BrainzBot
        I'm sorry, I don't remember "applause!", are you sure I should know about it?
      • ahvalmissaamine
        !recall applause
      • BrainzBot
      • BrainzGit
        [listenbrainz-server] 14amCap1712 opened pull request #2915 (03master…fix-numpu): Pin numpy to latest v1 https://github.com/metabrainz/listenbrainz-serv...
      • [listenbrainz-server] 14amCap1712 merged pull request #2907 (03master…dependabot/npm_and_yarn/braces-3.0.3): build(deps-dev): bump braces from 3.0.2 to 3.0.3 https://github.com/metabrainz/listenbrainz-serv...
      • [listenbrainz-server] 14dependabot[bot] opened pull request #2916 (03master…dependabot/npm_and_yarn/multi-1729a3ee87): build(deps): bump ws and engine.io-client https://github.com/metabrainz/listenbrainz-serv...
      • [listenbrainz-server] 14amCap1712 merged pull request #2915 (03master…fix-numpu): Pin numpy to latest v1 https://github.com/metabrainz/listenbrainz-serv...
      • d4rk has quit
      • d4rk joined the channel
      • [listenbrainz-server] 14amCap1712 merged pull request #2896 (03master…import): Fixed import playlist from Spotify https://github.com/metabrainz/listenbrainz-serv...
      • [listenbrainz-server] release 03v-2024-06-21.0 has been published by 14github-actions[bot]: https://github.com/metabrainz/listenbrainz-serv...
      • mayhem[m]
      • some noteworthy points in the code: https://github.com/mayhem/fast-fuzzy/blob/main/...
      • the CSV file is on wolf: wolf:~/metabrainz/fast_fuzzy
      • since everything is in ram, once build is done, a crude search prompt appears: "u2,where the streets have no name" is a valid query.
      • I am confused about the slow down in indexing speed -- something odd is happening.
      • lucifer
        makes sense.
      • mayhem[m]
        but if we can work that out, then we can create indexes via multiple cores. pretty easy.
      • lucifer
        for testing did you use another script?
      • mayhem[m]
        that should drastically reduce the indexing time.
      • mayhem[m]: all in one.
      • * in one. wrt testing script
      • lucifer
        i see
      • mayhem[m] heads to the office
      • mayhem[m]
        curious to see what you think and if you can spot a stooopid mistake. :)
      • lucifer
        i mean did you do any stress testing or batch of queries to arrive at the latency number?
      • mayhem[m]
        no serious testing as of yet.
      • lucifer
        makes sense
      • mayhem[m]
        but ESB and VA should be the most extreme search cases.
      • lucifer
        i think solr supports vector search too, so you could create vectors using tf-idf and hand it off to solr and then for search generate teh query vector and query solr by that.
      • wrt to the disk serialization comment.
      • Maxr1998 has quit
      • Maxr1998 joined the channel
      • rimskii[m]
        <lucifer> "rimskii: in the terminal on wolf..." <- does musicbrainz_db contain spotify_cache tables?
      • rimskii[m] uploaded an image: (326KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/BIVGFbIKaXqGSWxolTLnGdNV/Screenshot%202024-06-21%20at%2015.04.16.png >
      • can't find anything
      • no spotify_cache table
      • checking other tables, but there are no data either
      • ok there is data for artists tb
      • lucifer
        rimskii[m]: yes spotify related tables are in spotify_cache schema
      • rimskii[m]
        no spotify_cache schema (?)... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
      • mayhem[m]
        <lucifer> "i think solr supports vector..." <- something we should explore, for sure.
      • lucifer[m]
        [@rimskii](https://matrix.to/#/@kubrimskii:matrix.org) Checking
      • yvanzo[m]
        atj, zas: Just drafted a blog post: https://musicbrainz.wordpress.com/?p=11547&...
      • atj[m]
        I don't think I have access to WP
      • yvanzo[m]
        atj: Just invited you.
      • atj[m]
        yvanzo[m]: Seems like I don't have permissions to view the drafts
      • but I've created an account now
      • yvanzo[m]
        atj: And now?
      • atj[m]
        yep, all good now. thanks!
      • mayhem[m]
        lucifer (IRC): did you get a chance to read the fuzzy index code? any thoughts?
      • lucifer
        mayhem[m]: just took a cursory look so far, will read in detail and let you know in a while
      • mayhem[m]
        k
      • I'll adapt it to run on +1 threads so it can finally finish building for a full test.
      • atj[m]
        <yvanzo[m]> "atj, zas: Just drafted a blog..." <- I reworded it quite a bit. Hope that's OK! It's a bad habit - once I start tweaking things I get a bit carried away.
      • yvanzo[m]
        atj: Neat. Do you want to mention Ansible too?