artists is an utter freaking mess when it comes to boosting.
samj1912
how do you want to do it then?
I was thinking about that db method we talked about
ruaok
that wasn't a suggestion to do it differently -- just commiserating.
remind me of the DB method.
samj1912
we were thinking of dumping all the popular artists from LB and then assigning a score and storing them somewhere
sir will then pull up these scores and store them in solr docs
then while querying we will simply boost docs with more score
ruaok
ah yes.
samj1912
the main problem is updating those scores
ruaok
so, that solves only a portion of the problem though.
at least from what I understand.
samj1912
yup
ruaok
what if we did a test -- it might work, might not. but it should give us better clues about what might work....
samj1912
how do you want to test it?
ruaok
so, rather than waiting me to escape banking hell to setup the spark cluster, we fake the data.
if you count the number of tracks by an artist or number of releases and use that as a "popularity" score, then it should give us an idea of how well this might work.
samj1912
hmmm
ruaok
not perfect, but might be a useful test and the data is easy to come about.
samj1912
count of listens right?
ruaok
then shove that data in and score accordingly.
samj1912
or simply the mbdb count?
ruaok
no, leave LB data be for now.
MBDB count.
its a super quick test that will not be perfect.
samj1912
okay
so basically more number of releases + recordings = more popular
ruaok
actually, it might be better than the popularity one -- not all artists would get a score using the LB method.
but this way every MB artists gets one.
yes.
could be deeply flawed, might work great. I just don't know.
samj1912
1 main prob - this is basically artist ref count right?
ruaok
but it is an easy thing to try and to get the next step.
I dont recall how the artist refcount is incremented/decremented.
but I don't think this is as comprehensive.
samj1912
do you remember we removed the ref count from solr because of the shit ton of updates
ruaok
but if you got with releases as a measure, then it should be a simple aggregate query.
yes.
and if we don't use refcount we control when this gets updated.
samj1912
hmm
ruaok
that is a very good insight, but I think we can disregard it for the test I am proposing.
and remember, its just a test.
and if we use it, then we're incrementing the count for every new album release. totally fine as far as updates are concerned.
Leo__Verto joined the channel
samj1912
well here's the thing, the problem is cascading updates - let's say we update bach, 1 more release added, this will trigger an update for all the releases and recordings related to him since his doc got updated
this was the reason we removed ref count
1 release -> artist -> all related releases
which just causes a shitstorm when its one of the popular artists or VA
we can stop this however if we end triggers at artists
when a ref count is updated
ruaok
don't we have more granular control over what cascades an update?
samj1912
we do, I am just wondering how we will score releases/recordings then
ruaok
ah, different context.
samj1912
yup
ruaok
so, you agree that for artists, this is viable?
samj1912
yup
ruaok
ok, lets go for releases then.
samj1912
ill feed the data
okay
ruaok
how much of a problem is this on a release level?
and does the release analyzer actively change the boosting right now
?
samj1912
actively as in?
ruaok
are boosts done on the release level?
ivnat has quit
samj1912
I dont see any release boost file in the current search server
current = lucene one
ruaok
and on recordings?
or, better question, what other searches do we boost that we need to address?
samj1912
let me check
ruaok
IIRC artist is the most important one and we just agreed that we can use this approach.
samj1912
cool
ill integrate the ref count then, and stop the updates at artist
and boost acc. :p
ruaok
great. I'm curious to see how effective that will be.
rdswift
I know you're just testing right now, but how about artists that are not performers, such as engineers or mixers? Will this pose a problem long term?
Leo__Verto has quit
samj1912
hmm, ruaok so we dont have fine tuned control, I am wondering how to add it? just as a special case for ref_count?