yvanzo: do you know who had written the moxy adapters for the search server ? I had some questions regarding it.
2020-09-22 26643, 2020
_lucifer
I was thinking of changing the current adpater for status and primary type to the way i implemented it for packaging. That would fix SEARCH-608 and SEARCH-611 and other some inconsistent json issues. But I wanted to know if this could have some other side effects I am unaware of.
are similiar changes in place for the tickets i referred above?
2020-09-22 26636, 2020
_lucifer
because i can probably fix the search server side issues but i am not familiar with the musicbrainz server
2020-09-22 26627, 2020
d4rkie joined the channel
2020-09-22 26622, 2020
Nyanko-sensei has quit
2020-09-22 26628, 2020
ruaok
pristine___: I spent quite a lot of time thinking about our next steps. do you have a minute?
2020-09-22 26648, 2020
pristine___
ruaok: hey
2020-09-22 26644, 2020
ruaok
first point is kinda easy: candidate sets are nothing more than lists of recordings right? do they need anything else before we run them through the model?
2020-09-22 26644, 2020
pristine___
Not really.
2020-09-22 26610, 2020
pristine___
> first point is kinda easy: candidate sets are nothing more than lists of recordings right?
2020-09-22 26613, 2020
pristine___
Yes
2020-09-22 26627, 2020
pristine___
>do they need anything else before we run them through the model?
2020-09-22 26634, 2020
pristine___
No, not really :)
2020-09-22 26636, 2020
ruaok
ok, because in the community post we're a bit too focused on the two canidates sets (top, similar). we should open it up and let others make candidate sets.
2020-09-22 26651, 2020
pristine___
Right.
2020-09-22 26656, 2020
ruaok
how much work would it be for you to read a candidate set from an API call?
2020-09-22 26620, 2020
ruaok
because we could... take troi and slap an API on it.
2020-09-22 26629, 2020
pristine___
(But the recordings in the candidate set should be in training data, otherwise model will discard them)
2020-09-22 26632, 2020
ruaok
and then a troy patch becomes a candidate set.
2020-09-22 26647, 2020
ruaok
> (But the recordings in the candidate set should be in training data, otherwise model will discard them)
2020-09-22 26601, 2020
pristine___
> how much work would it be for you to read a candidate set from an API call?
2020-09-22 26605, 2020
ruaok
ah! that is the critical step I needed to know.
2020-09-22 26628, 2020
pristine___
Like a list of [user id/name, recording_id]
2020-09-22 26634, 2020
pristine___
Is that what you mean?
2020-09-22 26646, 2020
ruaok
no, simpler.
2020-09-22 26615, 2020
Zastai joined the channel
2020-09-22 26619, 2020
v6lur_ joined the channel
2020-09-22 26620, 2020
ruaok
I want to make "all recordings in MB tagged with 'punk' " into a candidate set.
2020-09-22 26604, 2020
ruaok
and then feed it into CF, so we can make a "recommend me the tracks I would like out of all tracks tagged with 'punk'" alg
LB-383: Allow updating usernames when they're changed in MusicBrainz
2020-09-22 26616, 2020
pristine___
Oh shit. I read something else. I have to read the Candidate sets, cool. I will need the recordings and the user they are associated with.
2020-09-22 26646, 2020
pristine___
> and then feed it into CF, so we can make a "recommend me the tracks I would like out of all tracks tagged with 'punk'" alg
2020-09-22 26649, 2020
pristine___
Sounds wow!
2020-09-22 26619, 2020
ruaok
exactly. how can we make that work?
2020-09-22 26620, 2020
Zastai
the user that did it seems to have done other bogus stuff too (like creating a "Boogie" instrument request and then adding it existing Epics)
2020-09-22 26615, 2020
_lucifer
that sounds more like content based filtering instead of collaborative filtering
2020-09-22 26623, 2020
pristine___
Umm... But there should be some criteria for these Candidate sets, then we can write an api using troi. For example, user x listened to lot of sufi genre, so make a candidate set of all sufi tracks in MB
2020-09-22 26628, 2020
pristine___
Something like this?
2020-09-22 26629, 2020
ruaok
Zastai: deleted.
2020-09-22 26630, 2020
pristine___
Maybe
2020-09-22 26622, 2020
ruaok
pristine___: let me read the docs you wrote on the CF spark stuff and refresh my mind
2020-09-22 26653, 2020
pristine___
Okay. Ping me.
2020-09-22 26638, 2020
yvanzo
_lucifer: I don’t think similar changes to MBS have been made for other tickets, but I can handle them.
2020-09-22 26645, 2020
_lucifer
yvanzo: great!, I'll make the changes and submit a PR soon and try to have a dicussion with Mineo on the same if possible
2020-09-22 26653, 2020
_lucifer
yvanzo: on a side note, are there any long term plans to move the server from perl to some other language?
2020-09-22 26653, 2020
Zastai
ruaok: there seems to be no option in Jira to report a user for bad behaviour; might be useful for cases like this
2020-09-22 26609, 2020
Gore joined the channel
2020-09-22 26616, 2020
yvanzo
_lucifer: long long term, too long term to make it a solid plan.
2020-09-22 26646, 2020
ruaok
pristine___: I think I will try and make a data flow graph of our system soon. a high level view is really needed for someone to come up to speed.
2020-09-22 26647, 2020
_lucifer
oh ok, 👍
2020-09-22 26635, 2020
ruaok
pristine___: at the most fundamental level a candidate set is: (recording, listen_count, user) yes?
2020-09-22 26601, 2020
Gazooo794 has quit
2020-09-22 26649, 2020
Gazooo794 joined the channel
2020-09-22 26616, 2020
pristine___
ruaok: we need (recording, listen_count, user) to train a model, a candidate set is just (recording, user), the model then assigns rating/score to this tuple i.e (recording, user)
2020-09-22 26620, 2020
Mineo
_lucifer: it's easier to answer your questions if you not only ping me, but also ask the questions :-) I might not be able to answer them immediately, but I'll try to answer them when I find the time
2020-09-22 26620, 2020
ruaok
> we need (recording, listen_count, user) to train a model, a candidate set is just (recording, user), the model then assigns rating/score to this tuple i.e (recording, user)
this comment right here needs to be at the very top level of the docs someplace. this one encapsulates everything we need.
2020-09-22 26636, 2020
ruaok
but there is something missing, no?
2020-09-22 26643, 2020
pristine___
?
2020-09-22 26615, 2020
ruaok
all tracks in the candidate set must be in the data set that trains the model, no?
2020-09-22 26645, 2020
pristine___
Yeah
2020-09-22 26603, 2020
ruaok
ok, can you re-write the comment to include this restriction?
2020-09-22 26619, 2020
pristine___
Which comment?
2020-09-22 26635, 2020
ruaok
> we need (recording, listen_count, user) to train a model, a candidate set is just (recording, user), the model then assigns rating/score to this tuple i.e (recording, user)
We need (recording, listen_count, user) to train a model, a candidate set is just (recording, user), the model then assigns rating/score to this tuple i.e (recording, user). Note that all tracks and users in the candidate set must be in the training set otherwise they will be discarded by the model.
pristine___: ok, thanks. that needs to go into the docs someplace.
2020-09-22 26604, 2020
Mineo
_lucifer: no problem at all :)
2020-09-22 26604, 2020
_lucifer
Mineo: but otherwise do you have any suggestions how to approach this ? My main issue with doing it using a adapter was how to add two fields
2020-09-22 26616, 2020
ruaok
pristine___: so we could build this: Given a giant list of recordings that are tagged with "punk", we can iterate over all the listens in LB and discard recordings not tagged with punk. we then build a training set and a candidate set from this set of punk recordings. and then we have a punk music collaborative filter.
2020-09-22 26619, 2020
ruaok
is that right?
2020-09-22 26652, 2020
pristine___
Yeah, sgtm.
2020-09-22 26651, 2020
ruaok
because we can crowd source these kinds of data sets pretty easily.
2020-09-22 26659, 2020
ruaok
we just need to build this extra filtering step.
2020-09-22 26614, 2020
pristine___
Hmm
2020-09-22 26645, 2020
pristine___
> because we can crowd source these kinds of data sets pretty easily.
2020-09-22 26649, 2020
Mineo
_lucifer: unfortunately not. you'll need to read the eclipselink moxy documentation for that
2020-09-22 26621, 2020
pristine___
ruaok: this is kinda a different project from *tracks you might like*?
2020-09-22 26635, 2020
_lucifer
Mineo: ok np. one last question, was there any particular reason for using jaxb over something jackson?
2020-09-22 26646, 2020
_lucifer
*like jackson
2020-09-22 26614, 2020
ruaok
pristine___: its more of the same, no?
2020-09-22 26643, 2020
ruaok
right now I feel that the candidates sets that we are creating are not allowing for enough diversity.
ruaok: that's because we don't have enough data/recordings in LB for top/similar artists
2020-09-22 26656, 2020
pristine___
For example
2020-09-22 26606, 2020
ruaok
exactly. the weakness is the similar/top artists data, not the CF or listens
2020-09-22 26646, 2020
ruaok
we should work to improve those, for sure, but at the same time being able to easily create more models from crowd sourced data sets would be very very useful.
2020-09-22 26603, 2020
ruaok
could this filtering of listens be done easily in spark?
2020-09-22 26605, 2020
pristine___
But we have to have some artists or some metric to choose the recordings, that was the idea behind candidate set back in 2019.
2020-09-22 26619, 2020
pristine___
> we should work to improve those, for sure, but at the same time being able to easily create more models from crowd sourced data sets would be very very useful.
2020-09-22 26637, 2020
pristine___
Right, that's why is said, they are kinda two different project/things
2020-09-22 26603, 2020
pristine___
Improved mapping and a big data dump of listens will really improve the playlists
2020-09-22 26622, 2020
pristine___
> could this filtering of listens be done be easily in spark?
2020-09-22 26625, 2020
pristine___
I think so
2020-09-22 26627, 2020
ruaok
lets talk about the mapping next.
2020-09-22 26649, 2020
pristine___
And then I want to talk about some similar comments on the post
2020-09-22 26650, 2020
pristine___
Cool
2020-09-22 26640, 2020
ruaok
I think I had a import realization yesterday.
2020-09-22 26647, 2020
ruaok
important.
2020-09-22 26648, 2020
pristine___
what you are talking about is like giving users playlist of specific genre or something like that, no?
2020-09-22 26659, 2020
ruaok
pristine___: yes, exactly.
2020-09-22 26607, 2020
pristine___
I understand that!
2020-09-22 26610, 2020
pristine___
I think
2020-09-22 26618, 2020
ruaok
it may be a different project, but its using the same back-end.
2020-09-22 26646, 2020
ruaok
ok, to the mapping now.
2020-09-22 26601, 2020
pristine___
It should be a separate feature all together, like a seperate tab next to top artist, similar artist, I mean I don't it to be an alternative to top/similar artist
2020-09-22 26626, 2020
ruaok
yes, sure.
2020-09-22 26635, 2020
pristine___
The playlist looks not so good because of two reasons, one is mapping, the other is, ummm.... You remember how we train data?