<mayhem[m]> "i think the 2 character limit is..." <- i had run typesense with 2 character limit as well for this comparison.
BrainzGit
[musicbrainz-server] 14reosarevok merged pull request #3315 (03master…useless-hangul-filler): MBS-13528 / MBS-13696: Calculate invalid edit notes in more places and with more invisible characters https://github.com/metabrainz/musicbrainz-serve...
Kladky joined the channel
relaxo[m] joined the channel
relaxo[m]
reosarevok How applies this edit note thing to Seeds? What happens, when there are forbidden chars in the seeded edit note?
reosarevok[m]
Nothing should happen as long as they're not the only characters :)
This is meant to stop notes which are only spaces, invisible chars and so on
If you have a normal note with some of them they will just work, in theory
(if you find a bug, let us know!)
Kladky has quit
Kladky joined the channel
relaxo[m]
Okay. No bug found yet, but I want to get to another point. Sometimes my seeder does not clearly identify an entity so it should be looked up manual. I want to make sure that the editor will notie it. My idea is to add a forbidden char to the edit note with a hint to point the editor to it.
reosarevok[m]
Hmm
If you seed a relationship with name only, it shouldn't let the user submit until they either select the entity or remove the relationship, I think?
yvanzo: minor, but in https://github.com/metabrainz/musicbrainz-serve... - we have the prepare jira step first as 1, talking about updating tickets, but the link to the tickets and the transitioning etc is only in the update jira step (7)
If we want to update the descriptions and whatnot in step 1 we should move that stuff to step 1, and if not we only need step 7 and step 1 should be empty, no? :)
Vile_Vulture joined the channel
Vile_Vulture has left the channel
relaxo[m]
reosarevok Thanks, will try. I am using derat/yambs command line. Maybe he is around and can answer if this is possible. derat
kellnerd[m]
Why you don't you simply try it yourself? I am pretty sure the form can't be submitted if you only seed names instead of MBIDs.
relaxo[m]
Not at home rn. Will try it later for sure.
reosarevok[m]
yvanzo, bitmap: did some Spanish translating, will release beta in the afternoon / evening - feel free to review or ask for review on stuff if you want me to put them out today
yvanzo: what was the requisite to put the Spanish translation out in prod? :)
(it's probably close to that, so I'd like to know what to prioritize)
reosarevok[m] goes for a dog walk and some food after
reosarevok: Yes, it is redundant because it has been too often overlooked. Tickets should actually be updated when transitioning to the development branch.
reosarevok: For prod languages, it should be nearly complete. For Spanish, relationship types seem to be the only last big chunk.
relaxo yeah, my experience matches others'. i think that edit forms generally won't let you submit if you've only set a name in a field that requires an MBID. pretty much every seeder relies on this functionality if it e.g. can't find an MBID for an artist.
1. beyoncé - dreaming: doing a simple artist and recording search it should be an exact match which works fine. but note the diateric e should be present.
solr by default doesn't remove diaterics from the text field. but we can create an extra field to store the unidecoded output
which is worth listening to, if you don't know it.
right zas ?
zas[m]
Definitively
mayhem[m]
1daf6f11-cbec-4503-b0b4-6b38716062ef "Metropolitan Opera Orchestra, Erich Leinsdorf", "Die Walküre", "Die Walküre: Act III, Scene I. Vorspiel "Walkürenritt: Hojotoho! Heiaha!" (Gerhilde, Helmwige, Waltraute, Schwertleite, Ortlinde, Siegrune, Grimgerge, Roßweiße)"
e4c8c9b3-38f2-41be-a2d8-1ad23d8b7d48 "peedranch ^ Jansky Noise", "Mi^grate", "Love, Exciting and New Come Aboard, We're Expecting You. Love, Life's Sweetest Reward. Let It Flow, It Floats Back to You. The Love Boat Soon Will Be Making Another Run, the Love Boat Promises Something for Everyone, Set a Course for Adventure, Your Mind on a New Romance. Love Won't Hurt Anymore, It's an Open Smile on a Friendly Shore. Yes Love! It's Love!
The Love Boat Soon Will Be Making Another Run. The Love Boat Promises Something for Everyone, Set a Course for Adventure, Your Mind on a New Romance. Love Won't Hurt Anymore, It's an Open Smile on a Friendly Shore. It's Love! It's Love! It's Love! It's the Love Boat-Ah! It's the Love Boat-Ah! (Recorded Onboard the Love Boat With the Kitchen Staff)"
seriously, hard to tell if spam or not. lol
do you think we need more lucifer ?
lucifer[m]
mayhem: not sure we might need some of different kinds but the mapping ticket should have them
mayhem[m] goes to look
reosarevok[m] uploaded an image: (184KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/DmDOeExuBqqpTcaCNarXLuZp/Screenshot%20from%202024-07-23%2013-08-34.png >
reosarevok[m]
yvanzo: ^
If that's what you had in mind, seems to work for me (see Addddddd or whatnot on the setlist info that I added to be 100% sure this was the JS version)
yvanzo[m]
reosarevok: Yes, thank you.
mayhem[m]
d0b09116-8cf1-4b5a-baf8-3db8d9fc5116 , "tripleS", "LOVElution <ↀ>", "Speed Love" (open the page to get the correct release name)
reosarevok: Found why some strings are not translated anymore: rebases on master missing intermediate changes.
There might be more than just string changes.
lusciouslover has quit
lusciouslover joined the channel
lucifer[m]
mayhem: `rising from the asheserosion 896979fc1a-f6bc-45a6-9240-a0ca06d213b3` can't get solr to match this so far.
* mayhem: `rising from the asheserosion 89 6979fc1a-f6bc-45a6-9240-a0ca06d213b3` can't get solr to match this so far.
s/6979fc1a-f6bc-45a6-9240-a0ca06d213b3//
* mayhem: recording:`rising from the ashes` artist:`erosion 89` can't get solr to match this so far.
mayhem[m]
odd. that seems pretty simple.
lucifer[m]
mayhem: okay got it to match using a fuzzy search. the right artist name is erosion89 (no space) but another observation.
recording name - exact search and artist name - fuzzy search. matches.
recording name - fuzzy search and artist name - fuzzy search. doesn't match.
so we need to test all combinations separately in worst cases.
mayhem[m]
huh???
aren't the two searches for artist name independent?
lucifer[m]
sorry not sure what you mean
mayhem[m]
"artist name - fuzzy search. matches." and "artist name - fuzzy search. doesn't match." I would expect that these give the same result.
because we're looking artist names separately from recording names, right?
lucifer[m]
ah no
recording name AND artist name at the same time.
mayhem[m]
ah, but shouldn't we be testing the separated lookups?
lucifer[m]
its one index like we have for typesense.
mayhem[m]
because we agreed that each should be looked up separately, right?
lucifer[m]
right but i don't think we can implement that easily with solr.
mayhem[m]
ugh.
not sure I like this.
lucifer[m]
its 50ms for one field fuzzy search. less than 10ms for exact search.
we can shard the index in solr based on artist names - but it wouldn't improve the perfomance.
mayhem[m]
the seems like a deal breaker for solr, no?
lucifer[m]
the overall performance would be better.
than what we have with typesense.
mayhem[m]
problem is that 10% better is not solving our problem.
2-3 times faster starts getting there.
lucifer[m]
makes sense. but i don't think there is an equally performant alternative
mayhem[m]
the testing I did with mnslib was clocking in around 5ms per lookup.
lucifer[m]
the testing we did with individual indexes also came about around the same
i see
can you remind me if it was exact match only or supports fuzzy too?
mayhem[m]
and clearly we'd need to do more than 1 lookup, but I'd expect 2-4 lookups per track. so 20ms or so.
fuzzy and also more than just 2 chars.
lucifer[m]
cool, lets load test on the vm and if its the same performance, we can go ahead with that
mayhem[m]
so, finish load testing your solution, then load test mine?
lucifer[m]
yes solr is done for now.
mayhem[m]
ok, cool.
the biggest problem that I am still facing with mnslib is the building of indexes.
but I can jump back into that with a fresh perspective if you want.
lucifer[m]
i think we can fix that building part later.
once we are satisfied with the querying part.
mayhem[m]
let me look at the code again.
ahhh, I think I see what is going on how. I am seeing a lot of disk I/O that I hasn't looked at before.
IO contention is limiting CPU time.
constained on Write. eh???
oh, I wonder if scikit learn is being too smart for us. might have a memory use limit and this writes to disk.
SKLEARN_WORKING_MEMORY
working_memoryint, default=None
If set, scikit-learn will attempt to limit the size of temporary arrays to this number of MiB (per job when parallelised), often saving both computation time and memory on expensive operations that can be performed in chunks. Global default: 1024.
mayhem
tykling: ping
bttf joined the channel
ursa-major has quit
ajhalili2006 has quit
outsidecontext has quit
RetroPunk has quit
djl has quit
serra has quit
irimi1 has quit
outsidecontext joined the channel
djl joined the channel
irimi1 joined the channel
serra joined the channel
RetroPunk joined the channel
ursa-major joined the channel
ajhalili2006 joined the channel
minimal joined the channel
Jade[m]
<yvanzo[m]> "bitmap, Jade, reosarevok: I just..." <- Aah only just saw this, oops!
On it now
Thank you :)
bitmap[m]
oops x2, I forgot to accept the invitation on Friday and it expired over the weekend. could you please send me one again yvanzo?
Jade[m] uploaded an image: (36KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/lXholpJauvAeuipYZGiKxHYd/image.png >
Jade[m]
Yeah I might have done the same if I didn't see it