#metabrainz

/

      • ruaok
        samj1912: that approach makes sense to me.
      • 2018-05-15 13511, 2018

      • samj1912
        basically we have normal boosts, based on field values
      • 2018-05-15 13514, 2018

      • ruaok
        boosting has gone towards query time boosts, so this is the way to go, right?
      • 2018-05-15 13517, 2018

      • samj1912
        and then we have these functions - https://wiki.apache.org/solr/FunctionQuery
      • 2018-05-15 13531, 2018

      • samj1912
        ruaok: yup, solr 7 is query boost exclusive
      • 2018-05-15 13535, 2018

      • samj1912
        index time boosts are gone
      • 2018-05-15 13547, 2018

      • ruaok
        ok, then its the clear path ahead.
      • 2018-05-15 13528, 2018

      • Leo__Verto has quit
      • 2018-05-15 13531, 2018

      • samj1912
        ruaok: Annotation and areas are done, I came up on artists, this is how it is boosted currently - https://github.com/metabrainz/search-server/blob/…
      • 2018-05-15 13542, 2018

      • samj1912
        it basically does it for a select few artists
      • 2018-05-15 13551, 2018

      • ruaok
        artists is an utter freaking mess when it comes to boosting.
      • 2018-05-15 13512, 2018

      • samj1912
        how do you want to do it then?
      • 2018-05-15 13546, 2018

      • samj1912
        I was thinking about that db method we talked about
      • 2018-05-15 13557, 2018

      • ruaok
        that wasn't a suggestion to do it differently -- just commiserating.
      • 2018-05-15 13510, 2018

      • ruaok
        remind me of the DB method.
      • 2018-05-15 13533, 2018

      • samj1912
        we were thinking of dumping all the popular artists from LB and then assigning a score and storing them somewhere
      • 2018-05-15 13545, 2018

      • samj1912
        sir will then pull up these scores and store them in solr docs
      • 2018-05-15 13557, 2018

      • samj1912
        then while querying we will simply boost docs with more score
      • 2018-05-15 13559, 2018

      • ruaok
        ah yes.
      • 2018-05-15 13504, 2018

      • samj1912
        the main problem is updating those scores
      • 2018-05-15 13513, 2018

      • ruaok
        so, that solves only a portion of the problem though.
      • 2018-05-15 13524, 2018

      • ruaok
        at least from what I understand.
      • 2018-05-15 13528, 2018

      • samj1912
        yup
      • 2018-05-15 13550, 2018

      • ruaok
        what if we did a test -- it might work, might not. but it should give us better clues about what might work....
      • 2018-05-15 13503, 2018

      • samj1912
        how do you want to test it?
      • 2018-05-15 13508, 2018

      • ruaok
        so, rather than waiting me to escape banking hell to setup the spark cluster, we fake the data.
      • 2018-05-15 13540, 2018

      • ruaok
        if you count the number of tracks by an artist or number of releases and use that as a "popularity" score, then it should give us an idea of how well this might work.
      • 2018-05-15 13557, 2018

      • samj1912
        hmmm
      • 2018-05-15 13507, 2018

      • ruaok
        not perfect, but might be a useful test and the data is easy to come about.
      • 2018-05-15 13515, 2018

      • samj1912
        count of listens right?
      • 2018-05-15 13517, 2018

      • ruaok
        then shove that data in and score accordingly.
      • 2018-05-15 13523, 2018

      • samj1912
        or simply the mbdb count?
      • 2018-05-15 13525, 2018

      • ruaok
        no, leave LB data be for now.
      • 2018-05-15 13527, 2018

      • ruaok
        MBDB count.
      • 2018-05-15 13536, 2018

      • ruaok
        its a super quick test that will not be perfect.
      • 2018-05-15 13540, 2018

      • samj1912
        okay
      • 2018-05-15 13555, 2018

      • samj1912
        so basically more number of releases + recordings = more popular
      • 2018-05-15 13500, 2018

      • ruaok
        actually, it might be better than the popularity one -- not all artists would get a score using the LB method.
      • 2018-05-15 13505, 2018

      • ruaok
        but this way every MB artists gets one.
      • 2018-05-15 13510, 2018

      • ruaok
        yes.
      • 2018-05-15 13526, 2018

      • ruaok
        could be deeply flawed, might work great. I just don't know.
      • 2018-05-15 13533, 2018

      • samj1912
        1 main prob - this is basically artist ref count right?
      • 2018-05-15 13538, 2018

      • ruaok
        but it is an easy thing to try and to get the next step.
      • 2018-05-15 13506, 2018

      • ruaok
        I dont recall how the artist refcount is incremented/decremented.
      • 2018-05-15 13518, 2018

      • ruaok
        but I don't think this is as comprehensive.
      • 2018-05-15 13530, 2018

      • samj1912
        do you remember we removed the ref count from solr because of the shit ton of updates
      • 2018-05-15 13544, 2018

      • ruaok
        but if you got with releases as a measure, then it should be a simple aggregate query.
      • 2018-05-15 13556, 2018

      • ruaok
        yes.
      • 2018-05-15 13509, 2018

      • ruaok
        and if we don't use refcount we control when this gets updated.
      • 2018-05-15 13520, 2018

      • samj1912
        hmm
      • 2018-05-15 13533, 2018

      • ruaok
        that is a very good insight, but I think we can disregard it for the test I am proposing.
      • 2018-05-15 13542, 2018

      • ruaok
        and remember, its just a test.
      • 2018-05-15 13532, 2018

      • ruaok
        and if we use it, then we're incrementing the count for every new album release. totally fine as far as updates are concerned.
      • 2018-05-15 13556, 2018

      • Leo__Verto joined the channel
      • 2018-05-15 13505, 2018

      • samj1912
        well here's the thing, the problem is cascading updates - let's say we update bach, 1 more release added, this will trigger an update for all the releases and recordings related to him since his doc got updated
      • 2018-05-15 13513, 2018

      • samj1912
        this was the reason we removed ref count
      • 2018-05-15 13538, 2018

      • samj1912
        1 release -> artist -> all related releases
      • 2018-05-15 13556, 2018

      • samj1912
        which just causes a shitstorm when its one of the popular artists or VA
      • 2018-05-15 13523, 2018

      • samj1912
        we can stop this however if we end triggers at artists
      • 2018-05-15 13528, 2018

      • samj1912
        when a ref count is updated
      • 2018-05-15 13531, 2018

      • ruaok
        don't we have more granular control over what cascades an update?
      • 2018-05-15 13553, 2018

      • samj1912
        we do, I am just wondering how we will score releases/recordings then
      • 2018-05-15 13504, 2018

      • ruaok
        ah, different context.
      • 2018-05-15 13510, 2018

      • samj1912
        yup
      • 2018-05-15 13521, 2018

      • ruaok
        so, you agree that for artists, this is viable?
      • 2018-05-15 13525, 2018

      • samj1912
        yup
      • 2018-05-15 13533, 2018

      • ruaok
        ok, lets go for releases then.
      • 2018-05-15 13535, 2018

      • samj1912
        ill feed the data
      • 2018-05-15 13541, 2018

      • samj1912
        okay
      • 2018-05-15 13552, 2018

      • ruaok
        how much of a problem is this on a release level?
      • 2018-05-15 13511, 2018

      • ruaok
        and does the release analyzer actively change the boosting right now
      • 2018-05-15 13511, 2018

      • ruaok
        ?
      • 2018-05-15 13524, 2018

      • samj1912
        actively as in?
      • 2018-05-15 13534, 2018

      • ruaok
        are boosts done on the release level?
      • 2018-05-15 13505, 2018

      • ivnat has quit
      • 2018-05-15 13536, 2018

      • samj1912
        I dont see any release boost file in the current search server
      • 2018-05-15 13551, 2018

      • samj1912
        current = lucene one
      • 2018-05-15 13546, 2018

      • ruaok
        and on recordings?
      • 2018-05-15 13501, 2018

      • ruaok
        or, better question, what other searches do we boost that we need to address?
      • 2018-05-15 13502, 2018

      • samj1912
        let me check
      • 2018-05-15 13516, 2018

      • ruaok
        IIRC artist is the most important one and we just agreed that we can use this approach.
      • 2018-05-15 13531, 2018

      • samj1912
        cool
      • 2018-05-15 13500, 2018

      • samj1912
        ill integrate the ref count then, and stop the updates at artist
      • 2018-05-15 13509, 2018

      • samj1912
        and boost acc. :p
      • 2018-05-15 13533, 2018

      • ruaok
        great. I'm curious to see how effective that will be.
      • 2018-05-15 13529, 2018

      • rdswift
        I know you're just testing right now, but how about artists that are not performers, such as engineers or mixers? Will this pose a problem long term?
      • 2018-05-15 13522, 2018

      • Leo__Verto has quit
      • 2018-05-15 13514, 2018

      • samj1912
        hmm, ruaok so we dont have fine tuned control, I am wondering how to add it? just as a special case for ref_count?
      • 2018-05-15 13515, 2018

      • bukwurm joined the channel
      • 2018-05-15 13538, 2018

      • samj1912
      • 2018-05-15 13548, 2018

      • samj1912
        all the entities that are updated when an artist is currently updated
      • 2018-05-15 13501, 2018

      • samj1912
        update for an artist = if any of its indexed field changes
      • 2018-05-15 13522, 2018

      • ruaok
        for now make it a stored field, or if it needs to be an indexed field, use that.
      • 2018-05-15 13541, 2018

      • ruaok
        don't make it special like a refcount -- forget about the concept of refcount for the time being
      • 2018-05-15 13510, 2018

      • samj1912
        then?
      • 2018-05-15 13550, 2018

      • ruaok
        store a new field. num-releases. then use that to do query time result ordering.
      • 2018-05-15 13558, 2018

      • ruaok
        I guess I don't understand what you're asking.
      • 2018-05-15 13522, 2018

      • samj1912
        okay, how's this - lets add ref count, but not add a trigger for artist when refcount is encountered?
      • 2018-05-15 13544, 2018

      • Leo__Verto joined the channel
      • 2018-05-15 13556, 2018

      • samj1912
        so basically the only time refcount will be updated is when any of the other things trigger an artist update
      • 2018-05-15 13521, 2018

      • ruaok
        add no triggers,nothing.
      • 2018-05-15 13524, 2018

      • ruaok
        just a dumb old field.
      • 2018-05-15 13545, 2018

      • ruaok
        just don't even think about updating that field right now.
      • 2018-05-15 13502, 2018

      • samj1912
        cool
      • 2018-05-15 13555, 2018

      • bukwurm
        LordSputnik: Hi, I will be not be present for next couple of hours. 😅
      • 2018-05-15 13519, 2018

      • bukwurm
        I'll send the relevant material later today.
      • 2018-05-15 13508, 2018

      • Leo__Verto has quit
      • 2018-05-15 13514, 2018

      • samj1912
        bukwurm: I am available if you wanna talk about solr
      • 2018-05-15 13512, 2018

      • outsidecontext has quit
      • 2018-05-15 13529, 2018

      • samj1912
        ping me up when you are available
      • 2018-05-15 13503, 2018

      • github joined the channel
      • 2018-05-15 13503, 2018

      • github
        [sir] samj1912 opened pull request #76: Add refcount to artist index (master...refcount) https://git.io/vp5Gd
      • 2018-05-15 13503, 2018

      • github has left the channel
      • 2018-05-15 13529, 2018

      • flamingspinach is now known as fs
      • 2018-05-15 13536, 2018

      • samj1912 just noticed the GH checks tab
      • 2018-05-15 13540, 2018

      • samj1912
        it is fancy :P
      • 2018-05-15 13544, 2018

      • Leo__Verto joined the channel
      • 2018-05-15 13517, 2018

      • madmouser1 joined the channel
      • 2018-05-15 13514, 2018

      • Dr-Flay joined the channel
      • 2018-05-15 13524, 2018

      • LordSputnik
        bukwurm: ok
      • 2018-05-15 13540, 2018

      • LordSputnik
        Please also send an update of what you've done today :)
      • 2018-05-15 13510, 2018

      • Leo__Verto has quit
      • 2018-05-15 13550, 2018

      • Leo__Verto joined the channel
      • 2018-05-15 13522, 2018

      • madmouser1 has quit
      • 2018-05-15 13502, 2018

      • UmkaDK
        Guy, where do I get your public GPG key from, to verify replication packets?
      • 2018-05-15 13525, 2018

      • UmkaDK
        ... I've been going through the docs but can't see it anywhere. :(
      • 2018-05-15 13535, 2018

      • CatQuest just noticed that the t-shirt logo infront outline a heart
      • 2018-05-15 13541, 2018

      • CatQuest
        its oretty awesome
      • 2018-05-15 13551, 2018

      • CatQuest
        toobad i cant have it.
      • 2018-05-15 13503, 2018

      • CatQuest
        (or it wouldn't be true if i did)
      • 2018-05-15 13522, 2018

      • CatQuest
        it's acrtually ridicolous.
      • 2018-05-15 13540, 2018

      • CatQuest
        chhavi_: tell your sister the design is awesome
      • 2018-05-15 13540, 2018

      • chhavi_
        Why can't you have it? Redbubble doesn't deliver?
      • 2018-05-15 13529, 2018

      • travis-ci joined the channel
      • 2018-05-15 13530, 2018

      • travis-ci
        metabrainz/picard#3412 (master - 848745d : Sambhav Kothari): The build passed.
      • 2018-05-15 13530, 2018

      • travis-ci
      • 2018-05-15 13530, 2018

      • travis-ci
      • 2018-05-15 13530, 2018

      • travis-ci has left the channel
      • 2018-05-15 13553, 2018

      • travis-ci joined the channel
      • 2018-05-15 13554, 2018

      • travis-ci
        metabrainz/picard#3412 (master - 848745d : Sambhav Kothari): The build passed.
      • 2018-05-15 13554, 2018

      • travis-ci
      • 2018-05-15 13554, 2018

      • travis-ci
      • 2018-05-15 13554, 2018

      • travis-ci has left the channel
      • 2018-05-15 13556, 2018

      • nawcom joined the channel
      • 2018-05-15 13524, 2018

      • HSOWA joined the channel
      • 2018-05-15 13524, 2018

      • HSOWA has quit
      • 2018-05-15 13524, 2018

      • HSOWA joined the channel
      • 2018-05-15 13555, 2018

      • KassOtsimine has quit
      • 2018-05-15 13507, 2018

      • yvanzo
        UmkaDK: gpg --recv-keys C777580F
      • 2018-05-15 13522, 2018

      • yvanzo