reosarevok: yvanzo: I don’t think ‘usually vertical’ is that helpful in identifying something that doesn’t have to be vertical, even if it is the most common. I understand that it might help differentiate it from banners (which really are all horizontal, I think?), but unfortunately a lot of posters can be landscape or square
If we need the description to be longer I would just take one of the dictionary definitions, since there’s people at Miriam-Webster or whatever being paid to argue about these things (I assume)
Something like “Usually a large printed or digital sheet, that often contains pictures, and is posted publically.” (slightly modified Miriam-Webster definition)
atj: and/or yvanzo : woohoo, Solr upgrade!! Just double checking that I am okay to announce “we have upgraded our Solr cluster to 9.6.1” (or something along those lines) on our socials?
Time to show off your hard work :D
From what I gather from the message history it is no longer just on beta?
pite has quit
Kladky joined the channel
serene-arc[m] joined the channel
serene-arc[m]
Hi all! I'm an app developer, writing a tool to upload playlists to ListenBrainz. I'm running into a little problem with the API and how to resolve songs to MBIDs. Would I be able to pick anyone's brain about that?
lucifer
serene-arc[m]: sure what's the issue?
d4rk has quit
d4rk joined the channel
rimskii[m]: i have created the necessary tables on wolf. try again now.
serene-arc[m]
So my project is the one linked below for reference. It searches the file tags to get the metadata to send to the ListenBrainz API. However, for whatever reason, it isn't the best at matching songs that don't have regular artist fields, so up to 15%-ish of songs aren't matched.
I notice that other tools such as mpdscribble don't seem to have this problem, so I was wondering if there's a way I'm using the API wrongly or something. Any help would be great!
Jade: I meant to fully sign-off a couple of email templates today, but ran out of time. I will try get round to it this weekend! The design won’t change I don’t think, I just want to adjust the text and maybe the links, and get some more to the community for feedback. So hopefully you won’t have to redo anything
serene-arc[m]
* lucifer (IRC): So my
aerozol[m]
Jade: If the screenshot you showed bitmap is of a email you’ve devved, that looks awesome!! No problem re. making the font size a bit larger. FYI the emails I get the most are subscription emails and they can contain a *lot* of items. So it can be nice to have some oversight/not make them too big. But I imagine this will be easy to tweak later
lucifer (IRC): we currently use that one! Unfortunately, it doesn't always work, at least as expected. Below are a couple of the curl commands that shows the problem. I'm not entirely sure if it's several problems that are having the same effect, or the same one with different cases.... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
lucifer
serene-arc[m]: i see, unfortunately the current mapping system doesn't take alias-es into account.
the selena gomez x marshmello one resolves if the x is removed, that should be simple to fix i think just missing x as a possible join phrase in mapping.
serene-arc[m]
lucifer (IRC): that's unfortunate that it doesn't take it into account. is there any way for me to use the resolution system that the last.fm proxy uses, or is that not publicly accessible?
lucifer
serene-arc[m]: do you mean the LFM compatible API that ListenBrainz supports?
if so, it goes through the same system as the API endpoint you are using.
i wonder if mpdscribble depends on people tagging their collections with Picard first?
serene-arc[m]
I'm not sure honestly, because it does manage to resolve the Selena Gomez song above when I play it. beets, of which I'm a maintainer, does the same thing as Picard if you haven't heard of it. My backup solution was to try and pull the MBID directly from the file, at the cost of giving up on these files for those that don't use beets/picard
lucifer
sorry i am a bit confused. to be clear, Picard uses a different way to match stuff than the LB endpoint i mentioned.
when does the selena gomez song resolve and when does it not?
outsidecontext[m
serene-arc: does the listen get submitted to LB witthout MBIDs and LB does the resolving server side or do you try to resolve MBID first locally and then submit with the found MBID?
rimskii[m]
<lucifer> "rimskii: i have created the..." <- just tested it, it works !
thank you :)
serene-arc[m]
lucifer (IRC): so my library is already organised with picard, and all of the metadata is consistent with that on MusicBrainz. My program to upload the playlists to ListenBrainz takes the file and reads the artist and title fields of the songs, and tries to find a match with the LB API. The correct MBID for that song is ff67fcb7-365a-4164-87e9-ef7768767528, but ListenBrainz fails to get that, even though the data is the same,
nominally.
outsidecontext to make a playlist, the MBID must be included.
but the results don't give that when searched with the same data through the LB API
lucifer
i see.
so there are multiple issues here, first not getting a match and second is getting a different match.
the second one is intentional, we have a concept of canonical data - a recording can be present of multiple releases we have a list of custom sorts to choose the "appropriate" one from MB.
we are working on improving this by letting users specify a release name as well during the search/mapping.
serene-arc[m]
So far I haven't been worried about getting alternate matches for the song. It's mostly about getting errors when no songs are returned
outsidecontext[m
serene-arc: sorry, I missed that it is about playlists. I thought it is about submission because of the mpdscribble comparison
lucifer
yes makes sense, that is something we need to fix.
outsidecontext[m
The majority of LB submission clients will first try to get the MBID from the file and use this for submission (if they support MBIDs in the first place), and if there is none leave the resolving to LB. If you need to resolve the MBIDs locally I'd still suggest to use the already found MBID first and do the lookup only if it is missing. That avoids false matches for MB tagged files
lucifer
can you give me a list of all the songs that you tried?
so far i see two issues - one because we don't check for artist alias and second because x is not treated as punctuation/join phrase.
i want to see if there are any low hanging fruits or obvious bugs that can be fixed soon-ish.
(rrsync is installed with rsync on Debian & Ubuntu) so no deployment needed, just the SSH key
yvanzo[m]
Hi aerozol: Yes, it is already in use by the main servers, and we keep doing adjustments, but it won’t be immediately available to mirrors .
aerozol: About posters, yes we need the descriptions to be longer, I would say to descriptive enough, that is to make sure that even newcomers (ignoring the context) are getting the whole (more or less broad) meaning we are trying to capture through this type. From that perspective, your proposal does the job.
aerozol: Also, yes I did look into dictionaries at first and took inspiration from these but last week discussion about this term has had no echo.
Hi atj, how can we keep `files/var/lib/solr/solr.xml` from the Ansible role in sync with `solr.xml` from `mbsssss`?
aerozol, yvanzo : ok, I did "Usually a large printed or digital sheet, that often contains pictures, and is generally posted publicly to promote the event." combining what we had and what you suggested
lucifer
mayhem: hi! can you push your nmslib search prototypes to github and share the link?
<lucifer> "rimskii: i have created the..." <- lucifer: do tables contain data for spotify id tracks?
trying to test here, but it doesnt give any data for it
rimskii[m] sent a code block: https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/pyoGVESZZnJMbDeGMAGoOart
lucifer
rimskii[m]: yes but only a limited number of artists/tracks/albums
rimskii[m]
okay
mayhem[m]
<lucifer> "mayhem: hi! can you push your..." <- will do in a bit. I should really add some comments because the code does some weird shit right now.
Sophist-UK joined the channel
lucifer
rimskii[m]: in the terminal on wolf, run `psql URL_TO_MB_DATABASE_HERE` and then you can query something like `select * from spotify_cache.artist`, `spotify_cache.track` etc.
to see what data is there. i copied ~25000 tracks for ~100 popular artists.
mayhem[m]: 👍
rimskii[m]
okay, thanks !!
mayhem[m]
lucifer (IRC): did you deploy rimskii 's work to prod yesterday? is that the first gsoc work to go into prod this year?
lucifer
mayhem[m]: not yet deployed, a LB PR that depends on it also needs to be merged first. hopefully today.
rimskii[m]
lucifer: yay
mayhem[m]
impressive.
mayhem[m] checks clock. june. well done rimskii!
ansh[m] has quit
ahvalmissaamine
!recall applause!
BrainzBot
I'm sorry, I don't remember "applause!", are you sure I should know about it?
the CSV file is on wolf: wolf:~/metabrainz/fast_fuzzy
since everything is in ram, once build is done, a crude search prompt appears: "u2,where the streets have no name" is a valid query.
I am confused about the slow down in indexing speed -- something odd is happening.
lucifer
makes sense.
mayhem[m]
but if we can work that out, then we can create indexes via multiple cores. pretty easy.
lucifer
for testing did you use another script?
mayhem[m]
that should drastically reduce the indexing time.
mayhem[m]: all in one.
* in one. wrt testing script
lucifer
i see
mayhem[m] heads to the office
mayhem[m]
curious to see what you think and if you can spot a stooopid mistake. :)
lucifer
i mean did you do any stress testing or batch of queries to arrive at the latency number?
mayhem[m]
no serious testing as of yet.
lucifer
makes sense
mayhem[m]
but ESB and VA should be the most extreme search cases.
lucifer
i think solr supports vector search too, so you could create vectors using tf-idf and hand it off to solr and then for search generate teh query vector and query solr by that.
wrt to the disk serialization comment.
Maxr1998 has quit
Maxr1998 joined the channel
rimskii[m]
<lucifer> "rimskii: in the terminal on wolf..." <- does musicbrainz_db contain spotify_cache tables?
rimskii[m] uploaded an image: (326KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/BIVGFbIKaXqGSWxolTLnGdNV/Screenshot%202024-06-21%20at%2015.04.16.png >
can't find anything
no spotify_cache table
checking other tables, but there are no data either
ok there is data for artists tb
lucifer
rimskii[m]: yes spotify related tables are in spotify_cache schema
yvanzo[m]: Seems like I don't have permissions to view the drafts
but I've created an account now
yvanzo[m]
atj: And now?
atj[m]
yep, all good now. thanks!
mayhem[m]
lucifer (IRC): did you get a chance to read the fuzzy index code? any thoughts?
lucifer
mayhem[m]: just took a cursory look so far, will read in detail and let you know in a while
mayhem[m]
k
I'll adapt it to run on +1 threads so it can finally finish building for a full test.
atj[m]
<yvanzo[m]> "atj, zas: Just drafted a blog..." <- I reworded it quite a bit. Hope that's OK! It's a bad habit - once I start tweaking things I get a bit carried away.