#metabrainz

/

      • yosijo joined the channel
      • ruaok
        alastairp: after some playing around with that (thanks for the tips!) I have more insight, but I doubt it will help me much.
      • I am trying to work out how to match partial albums and to differentiate a good partial match from a half-way decent incorrect match.
      • alastairp
        👍 I don't have any experience with writing custom solr/lucene code, so could be interesting to see how that works
      • ruaok
        meaning, that I could have 6 out of 12 tracks matched and get something like a 50% score.
      • or I could match 12 out of 12 tracks from an unrelated release that also gives 50%.
      • the former is acceptable, the latter is not.
      • alastairp
        what does a document look like for you? album name and array of 12 tracks in 1 document? or a doc is just 1 track?
      • ruaok
      • alastairp
        what about post-processing? Or is the problem getting the good matches to show up in the first page of results before processing?
      • ruaok
        the recoding_names is the key here.
      • track numbers with track names seems to do wonders for finding the right match.
      • post processing is what I was thinking about.
      • fuzzy compare like fields and weight them, sum the whole lot and see what the result is.
      • I think I can get good results on the first page, because of the peculiar formatting of recording_names.
      • alastairp
        and does your search query include all track names the query too?
      • ?q=recording_names:I care I miss you Best thing....
      • ruaok
        `title:4 AND count:13 AND recording_names:1\ 1\+1\ \ 2\ I\ Care\ \ 3\ I\ Miss\ You\ \ 4\ Best\ Thing\ I\ Never\ Had\ \ 5\ Party\ \ 6\ Rather\ Die\ Young\ \ 7\ Start\ Over\ \ 8\ Love\ on\ Top\ \ 9\ Countdown\ \ 10\ End\ of\ Time\ \ 11\ I\ Was\ Here\ \ 12\ Run\ the\ World\ \(Girls\)\ \ 13\ Dreaming`
      • alastairp
        right
      • ruaok
        not friendly for human searchers, but fuck humans.
      • alastairp
        two things that come to mind which I'd try next:
      • maybe something with query time boosts - to say that some things are more important than others
      • and just throwing out an idea - did you try making recording names multi valued? it's possible that there are some solr functions that can tell you how many values in a field matched your search query, you could use that + a boost
      • ruaok
        I haven't done either of those -- just starting to tune them.
      • let me check out multi-valued fields, that sounds much more useful than just boosts
      • alastairp
        in fact, all of your fields appear to be multi-valued already - given the [] surrounding them :)
      • ruaok
        yes, indeed.
      • but I am not certain this is going to help.
      • I think I need to build some serious test cases and expected results before I can work towards a solution.
      • alastairp
        yeah, I was just hypothesising that there was a percentage_fields_match(recording_names, 75) function that you could call to boost certain results
      • ruaok
        but I get the feeling that solr alone won't cut it.
      • alastairp
        agreed, I'm throwing out ideas without really knowing your requirements and test cases
      • ruaok
        let me whip up some test cases to shore up the ideas.
      • kepstin joined the channel
      • legoktm[m] joined the channel
      • just1602 has quit
      • just1602 joined the channel
      • I think that about sums up the requirements I have
      • ruaok goes to fetch essentials
      • BrainzGit
        [musicbrainz-server] 14reosarevok opened pull request #2205 (03master…MBS-11861): MBS-11861: improve loopParity classes for tablesorter https://github.com/metabrainz/musicbrainz-serve...
      • [listenbrainz-server] 14amCap1712 reopened pull request #1525 (03master…typesense): Do not block startup if typesense is down https://github.com/metabrainz/listenbrainz-serv...
      • alastairp
        zas: hi. what tools are we going to replace in our service mesh? git2consul yes. registrator too?
      • yosijo has quit
      • CatQuest has quit
      • CatQuest joined the channel
      • CatQuest has quit
      • CatQuest joined the channel
      • yvanzo
        yyoung[m]: Finallty reviewed the other PR too, also there might be some conflicts with PRs reosarevok just merged.
      • reosarevok
        Sorry if so!
      • yyoung
        yvanzo: Thanks! Will have a look immediately
      • reosarevok: That's a common case, I'll fix it later. :)
      • ruaok
        reosarevok: do you know of a release group that contains releases with the same number of tracks, but different track orders?
      • reosarevok
        ruaok: but otherwise the same tracks?
      • ruaok
        yes
      • reosarevok
        (recordings)
      • Ok
      • yvanzo
      • (The recordings need to be merged.)
      • alastairp
        this one has a different title depending on what country it was released in https://musicbrainz.org/release-group/6d0c7a70-...
      • reosarevok
        That'll take time to merge for the test though
      • ruaok
        yvanzo: that's perfect, thanks.
      • reosarevok
        https://musicbrainz.org/release/63e4ec64-34ec-4... and https://musicbrainz.org/release/16885cb3-5ac7-4... seem to have the same issue and the recordings are already the same
      • (funnily enough, that needs *splitting* instead, since one is mono :D )
      • ruaok
        alastairp: interesting, but that I think is not a problem at all.
      • alastairp
        yeah, sure. I was pretty sure that this band had the case you asked for in an album released in NZ and AU, but I can't find it
      • hmm. https://musicbrainz.org/release-group/3ef22e1d-... should have one - last track of side A on the vinyl is swapped with the next track compared to the CD, but I see no vinyl release on MB
      • yvanzo
        Actually recordings are the same already, but because the "Recording artist" was specified, I thought it was a specific relationship, so different recordings.
      • ruaok
        the taste of honey works as a good test case. my results already are:
      • the last one is out of order with the first two, so the results are good.
      • it seems that parial matches are the only thing that are not really supported straight out of the box.
      • alastairp: I think those cases in the doc illustrate the most critical examples.
      • given that, any more ideas about getting solr to do this right?
      • yyoung
        yvanzo: Replied to some of the comments, is now investigating/fixing the rest.
      • alastairp
        ruaok: do you have some quick scripts for building the index?
      • I can have a poke at it tomorrow, but other than what I've already said today I'm not sure what the next possible step could be
      • ruaok
      • I think I will try my hand at rolling my own evaluator -- I think solr does a good enough job at fecthing the candidates I am interested in.
      • for instance, here are the results for the taste of honey test:
      • 89 to 22 is quite the drop in score. I know i don't need to examine the 22 and below its
      • hits
      • alastairp
        right. then you can filter based on number tracks, track order, release type, etc?
      • ruaok
        yes, and compare only the tracks that are present in the given query and be more forgiving for missing tracks.
      • yvanzo
        yyoung: Can you please make your code with the reduced label available (on a temporary branch if needed) so we can have a look at it too?
      • yyoung
        yvanzo: I already updated that in the PR
      • yvanzo
        Sorry I missed https://github.com/metabrainz/musicbrainz-serve... that was during my review.
      • yyoung
        Could you be more specifc about "restricting options up to two possible relationship types is not the same as auto-selecting two relationship types"?
      • Now I'm thinking if it's better to split the logic
      • yvanzo
        yyoung: a <select> element with two options vs two <select> elements with freezed options
      • "frozen"
      • yyoung
        Are we auto-selecting those 2 types all the time?
      • It did show 2 options in the select
      • Etua joined the channel
      • But that's why I'm thinking about splitting :)
      • yvanzo
        Not all the time: Mainly Norfolk & Jamendo have auto-select, Internet Archive don’t.
      • Etua has quit
      • yyoung
        Let's focus on jamendo here
      • yvanzo
        Yes, Jamendo always provides both download and streaming options.
      • So both relationship types should be auto-selected.
      • yyoung
        How is this issue represented in the UI?
      • yvanzo
        It would be better represented using the other PR.
      • Since relationships are grouped by URL, auto-selected relationship types (download and streaming) cannot be removed separately, but the URL can.
      • yyoung
        Hmmm...
      • Actually I was thinking using 'types' to validate rel types combination
      • yvanzo
        It’s the same for only 1 auto-selected relationship type.
      • yyoung
        I haven't thought about how to lock the 2 rels together in the UI
      • yvanzo
        You probably need a flag 'autoSelected' for each relationship.
      • yyoung
        So you're suggesting changing to '[LINK_TYPES.downloadfree, LINK_TYPES.streamingfree]' ?
      • My initial thought was to use 'types' to limit possible type and type combinations, if there's only a single type when validating and it's not in the array, then we could show the error
      • So here it implies that Jamendo links can only have 2 type together, not any single type
      • yvanzo
        Yes
      • yyoung
        Therefore after splitting this mechanism no longer works.
      • yvanzo
        No, you have to check relationship types from the UI against auto-selected relationship types.
      • ... at first (then against 'types' if no rel type has been auto-selected).
      • reosarevok
        bitmap: for when you're around, I'm looking into MBS-11862
      • BrainzBot
        MBS-11862: Do not show deprecated relationship types with 0 uses in selectors https://tickets.metabrainz.org/browse/MBS-11862
      • reosarevok
        What places do you think we should still show the deprecated types?
      • Obviously they need to be shown in the /relationships page to be able to edit them at all
      • yyoung
        reosarevok: FYI, my PR also affects deprecated types
      • reosarevok
        And we don't want them in the release editor, relationship editors
      • yyoung: deprecated types will still be available to you in the URL editor, unless they have 0 uses as well :)
      • bitmap: what about statistics? thinking not? edit search - thinking yes?
      • bitmap
        reosarevok: if they still have uses I'd probably keep them in stats and docs, otherwise I'd hide them everywhere?
      • yyoung
        reosarevok: It seems one of the features has already eliminated some of them :)
      • reosarevok
        bitmap: Without uses, I mean :)
      • Having them in the edit search lets us search for edits that used them, which might be useful? Dunno
      • yyoung
        yvanzo: So how is this different from directly checking against 'types'
      • bitmap
        reosarevok: edit search is the only place I could see them being useful
      • reosarevok
        In any case, we can't stop showing them under /relationships or we will be unable to make any change if we ever need to :)
      • So I would still show them there
      • yyoung
        Currently I haven't hide the type selectors when auto-selecting type combination
      • bitmap
        you could hide them from non rel editors if it matters
      • reosarevok
        I thought about that
      • yvanzo
        yyoung: 'types' allows to select any number of relationship types in a set, it doesn’t make mandatory to select them all.
      • reosarevok
        But it seems... harder :) (in that I'd need to pass a specific extra value to see if they have 0 uses)
      • And I think it might be of some use to have them there, at least so that if someone sees them in the edit search, they can figure them out
      • yvanzo
        yyoung: It would probably make more sense if based on the other PR.
      • yyoung
        Yes I think so
      • reosarevok
        We *probably* should have something there like "This type has been removed and is only kept for historical data" or something?
      • If seeing the specific page of the type
      • yyoung
        Will we allow users to select more types if it's already auto-selected?
      • IIUC the current code would force users to select both 2 types
      • yvanzo
        yyoung: No, just the same as it is currenly with 1 auto-selected type.
      • It's auto-selected, they don’t have to select them.
      • yyoung
        Let me explain my ideas: Based on the other PR the links are already grouped, then I'll take all relationships under a URL to check it against 'types', whether it's a single type or multiple types
      • By 'multiple(rel1, rel2)' it enforces to select both of them at the same time
      • While 'rel1, rel2' allows only one of them, excluding the combination
      • bitmap
        reosarevok: sounds okay to me. I'd probably display (deprecated) next to them in the edit search options too