artists is an utter freaking mess when it comes to boosting.
2018-05-15 13512, 2018
samj1912
how do you want to do it then?
2018-05-15 13546, 2018
samj1912
I was thinking about that db method we talked about
2018-05-15 13557, 2018
ruaok
that wasn't a suggestion to do it differently -- just commiserating.
2018-05-15 13510, 2018
ruaok
remind me of the DB method.
2018-05-15 13533, 2018
samj1912
we were thinking of dumping all the popular artists from LB and then assigning a score and storing them somewhere
2018-05-15 13545, 2018
samj1912
sir will then pull up these scores and store them in solr docs
2018-05-15 13557, 2018
samj1912
then while querying we will simply boost docs with more score
2018-05-15 13559, 2018
ruaok
ah yes.
2018-05-15 13504, 2018
samj1912
the main problem is updating those scores
2018-05-15 13513, 2018
ruaok
so, that solves only a portion of the problem though.
2018-05-15 13524, 2018
ruaok
at least from what I understand.
2018-05-15 13528, 2018
samj1912
yup
2018-05-15 13550, 2018
ruaok
what if we did a test -- it might work, might not. but it should give us better clues about what might work....
2018-05-15 13503, 2018
samj1912
how do you want to test it?
2018-05-15 13508, 2018
ruaok
so, rather than waiting me to escape banking hell to setup the spark cluster, we fake the data.
2018-05-15 13540, 2018
ruaok
if you count the number of tracks by an artist or number of releases and use that as a "popularity" score, then it should give us an idea of how well this might work.
2018-05-15 13557, 2018
samj1912
hmmm
2018-05-15 13507, 2018
ruaok
not perfect, but might be a useful test and the data is easy to come about.
2018-05-15 13515, 2018
samj1912
count of listens right?
2018-05-15 13517, 2018
ruaok
then shove that data in and score accordingly.
2018-05-15 13523, 2018
samj1912
or simply the mbdb count?
2018-05-15 13525, 2018
ruaok
no, leave LB data be for now.
2018-05-15 13527, 2018
ruaok
MBDB count.
2018-05-15 13536, 2018
ruaok
its a super quick test that will not be perfect.
2018-05-15 13540, 2018
samj1912
okay
2018-05-15 13555, 2018
samj1912
so basically more number of releases + recordings = more popular
2018-05-15 13500, 2018
ruaok
actually, it might be better than the popularity one -- not all artists would get a score using the LB method.
2018-05-15 13505, 2018
ruaok
but this way every MB artists gets one.
2018-05-15 13510, 2018
ruaok
yes.
2018-05-15 13526, 2018
ruaok
could be deeply flawed, might work great. I just don't know.
2018-05-15 13533, 2018
samj1912
1 main prob - this is basically artist ref count right?
2018-05-15 13538, 2018
ruaok
but it is an easy thing to try and to get the next step.
2018-05-15 13506, 2018
ruaok
I dont recall how the artist refcount is incremented/decremented.
2018-05-15 13518, 2018
ruaok
but I don't think this is as comprehensive.
2018-05-15 13530, 2018
samj1912
do you remember we removed the ref count from solr because of the shit ton of updates
2018-05-15 13544, 2018
ruaok
but if you got with releases as a measure, then it should be a simple aggregate query.
2018-05-15 13556, 2018
ruaok
yes.
2018-05-15 13509, 2018
ruaok
and if we don't use refcount we control when this gets updated.
2018-05-15 13520, 2018
samj1912
hmm
2018-05-15 13533, 2018
ruaok
that is a very good insight, but I think we can disregard it for the test I am proposing.
2018-05-15 13542, 2018
ruaok
and remember, its just a test.
2018-05-15 13532, 2018
ruaok
and if we use it, then we're incrementing the count for every new album release. totally fine as far as updates are concerned.
2018-05-15 13556, 2018
Leo__Verto joined the channel
2018-05-15 13505, 2018
samj1912
well here's the thing, the problem is cascading updates - let's say we update bach, 1 more release added, this will trigger an update for all the releases and recordings related to him since his doc got updated
2018-05-15 13513, 2018
samj1912
this was the reason we removed ref count
2018-05-15 13538, 2018
samj1912
1 release -> artist -> all related releases
2018-05-15 13556, 2018
samj1912
which just causes a shitstorm when its one of the popular artists or VA
2018-05-15 13523, 2018
samj1912
we can stop this however if we end triggers at artists
2018-05-15 13528, 2018
samj1912
when a ref count is updated
2018-05-15 13531, 2018
ruaok
don't we have more granular control over what cascades an update?
2018-05-15 13553, 2018
samj1912
we do, I am just wondering how we will score releases/recordings then
2018-05-15 13504, 2018
ruaok
ah, different context.
2018-05-15 13510, 2018
samj1912
yup
2018-05-15 13521, 2018
ruaok
so, you agree that for artists, this is viable?
2018-05-15 13525, 2018
samj1912
yup
2018-05-15 13533, 2018
ruaok
ok, lets go for releases then.
2018-05-15 13535, 2018
samj1912
ill feed the data
2018-05-15 13541, 2018
samj1912
okay
2018-05-15 13552, 2018
ruaok
how much of a problem is this on a release level?
2018-05-15 13511, 2018
ruaok
and does the release analyzer actively change the boosting right now
2018-05-15 13511, 2018
ruaok
?
2018-05-15 13524, 2018
samj1912
actively as in?
2018-05-15 13534, 2018
ruaok
are boosts done on the release level?
2018-05-15 13505, 2018
ivnat has quit
2018-05-15 13536, 2018
samj1912
I dont see any release boost file in the current search server
2018-05-15 13551, 2018
samj1912
current = lucene one
2018-05-15 13546, 2018
ruaok
and on recordings?
2018-05-15 13501, 2018
ruaok
or, better question, what other searches do we boost that we need to address?
2018-05-15 13502, 2018
samj1912
let me check
2018-05-15 13516, 2018
ruaok
IIRC artist is the most important one and we just agreed that we can use this approach.
2018-05-15 13531, 2018
samj1912
cool
2018-05-15 13500, 2018
samj1912
ill integrate the ref count then, and stop the updates at artist
2018-05-15 13509, 2018
samj1912
and boost acc. :p
2018-05-15 13533, 2018
ruaok
great. I'm curious to see how effective that will be.
2018-05-15 13529, 2018
rdswift
I know you're just testing right now, but how about artists that are not performers, such as engineers or mixers? Will this pose a problem long term?
2018-05-15 13522, 2018
Leo__Verto has quit
2018-05-15 13514, 2018
samj1912
hmm, ruaok so we dont have fine tuned control, I am wondering how to add it? just as a special case for ref_count?