yvanzo: do you know who had written the moxy adapters for the search server ? I had some questions regarding it.
I was thinking of changing the current adpater for status and primary type to the way i implemented it for packaging. That would fix SEARCH-608 and SEARCH-611 and other some inconsistent json issues. But I wanted to know if this could have some other side effects I am unaware of.
are similiar changes in place for the tickets i referred above?
because i can probably fix the search server side issues but i am not familiar with the musicbrainz server
d4rkie joined the channel
Nyanko-sensei has quit
ruaok
pristine___: I spent quite a lot of time thinking about our next steps. do you have a minute?
pristine___
ruaok: hey
ruaok
first point is kinda easy: candidate sets are nothing more than lists of recordings right? do they need anything else before we run them through the model?
pristine___
Not really.
> first point is kinda easy: candidate sets are nothing more than lists of recordings right?
Yes
>do they need anything else before we run them through the model?
No, not really :)
ruaok
ok, because in the community post we're a bit too focused on the two canidates sets (top, similar). we should open it up and let others make candidate sets.
pristine___
Right.
ruaok
how much work would it be for you to read a candidate set from an API call?
because we could... take troi and slap an API on it.
pristine___
(But the recordings in the candidate set should be in training data, otherwise model will discard them)
ruaok
and then a troy patch becomes a candidate set.
> (But the recordings in the candidate set should be in training data, otherwise model will discard them)
pristine___
> how much work would it be for you to read a candidate set from an API call?
ruaok
ah! that is the critical step I needed to know.
pristine___
Like a list of [user id/name, recording_id]
Is that what you mean?
ruaok
no, simpler.
Zastai joined the channel
v6lur_ joined the channel
I want to make "all recordings in MB tagged with 'punk' " into a candidate set.
and then feed it into CF, so we can make a "recommend me the tracks I would like out of all tracks tagged with 'punk'" alg
LB-383: Allow updating usernames when they're changed in MusicBrainz
pristine___
Oh shit. I read something else. I have to read the Candidate sets, cool. I will need the recordings and the user they are associated with.
> and then feed it into CF, so we can make a "recommend me the tracks I would like out of all tracks tagged with 'punk'" alg
Sounds wow!
ruaok
exactly. how can we make that work?
Zastai
the user that did it seems to have done other bogus stuff too (like creating a "Boogie" instrument request and then adding it existing Epics)
_lucifer
that sounds more like content based filtering instead of collaborative filtering
pristine___
Umm... But there should be some criteria for these Candidate sets, then we can write an api using troi. For example, user x listened to lot of sufi genre, so make a candidate set of all sufi tracks in MB
Something like this?
ruaok
Zastai: deleted.
pristine___
Maybe
ruaok
pristine___: let me read the docs you wrote on the CF spark stuff and refresh my mind
pristine___
Okay. Ping me.
yvanzo
_lucifer: I don’t think similar changes to MBS have been made for other tickets, but I can handle them.
_lucifer
yvanzo: great!, I'll make the changes and submit a PR soon and try to have a dicussion with Mineo on the same if possible
yvanzo: on a side note, are there any long term plans to move the server from perl to some other language?
Zastai
ruaok: there seems to be no option in Jira to report a user for bad behaviour; might be useful for cases like this
Gore joined the channel
yvanzo
_lucifer: long long term, too long term to make it a solid plan.
ruaok
pristine___: I think I will try and make a data flow graph of our system soon. a high level view is really needed for someone to come up to speed.
_lucifer
oh ok, đź‘Ť
ruaok
pristine___: at the most fundamental level a candidate set is: (recording, listen_count, user) yes?
Gazooo794 has quit
Gazooo794 joined the channel
pristine___
ruaok: we need (recording, listen_count, user) to train a model, a candidate set is just (recording, user), the model then assigns rating/score to this tuple i.e (recording, user)
Mineo
_lucifer: it's easier to answer your questions if you not only ping me, but also ask the questions :-) I might not be able to answer them immediately, but I'll try to answer them when I find the time
ruaok
> we need (recording, listen_count, user) to train a model, a candidate set is just (recording, user), the model then assigns rating/score to this tuple i.e (recording, user)
this comment right here needs to be at the very top level of the docs someplace. this one encapsulates everything we need.
but there is something missing, no?
pristine___
?
ruaok
all tracks in the candidate set must be in the data set that trains the model, no?
pristine___
Yeah
ruaok
ok, can you re-write the comment to include this restriction?
pristine___
Which comment?
ruaok
> we need (recording, listen_count, user) to train a model, a candidate set is just (recording, user), the model then assigns rating/score to this tuple i.e (recording, user)
We need (recording, listen_count, user) to train a model, a candidate set is just (recording, user), the model then assigns rating/score to this tuple i.e (recording, user). Note that all tracks and users in the candidate set must be in the training set otherwise they will be discarded by the model.
pristine___: ok, thanks. that needs to go into the docs someplace.
Mineo
_lucifer: no problem at all :)
_lucifer
Mineo: but otherwise do you have any suggestions how to approach this ? My main issue with doing it using a adapter was how to add two fields
ruaok
pristine___: so we could build this: Given a giant list of recordings that are tagged with "punk", we can iterate over all the listens in LB and discard recordings not tagged with punk. we then build a training set and a candidate set from this set of punk recordings. and then we have a punk music collaborative filter.
is that right?
pristine___
Yeah, sgtm.
ruaok
because we can crowd source these kinds of data sets pretty easily.
we just need to build this extra filtering step.
pristine___
Hmm
> because we can crowd source these kinds of data sets pretty easily.
Mineo
_lucifer: unfortunately not. you'll need to read the eclipselink moxy documentation for that
pristine___
ruaok: this is kinda a different project from *tracks you might like*?
_lucifer
Mineo: ok np. one last question, was there any particular reason for using jaxb over something jackson?
*like jackson
ruaok
pristine___: its more of the same, no?
right now I feel that the candidates sets that we are creating are not allowing for enough diversity.
ruaok: that's because we don't have enough data/recordings in LB for top/similar artists
For example
ruaok
exactly. the weakness is the similar/top artists data, not the CF or listens
we should work to improve those, for sure, but at the same time being able to easily create more models from crowd sourced data sets would be very very useful.
could this filtering of listens be done easily in spark?
pristine___
But we have to have some artists or some metric to choose the recordings, that was the idea behind candidate set back in 2019.
> we should work to improve those, for sure, but at the same time being able to easily create more models from crowd sourced data sets would be very very useful.
Right, that's why is said, they are kinda two different project/things
Improved mapping and a big data dump of listens will really improve the playlists
> could this filtering of listens be done be easily in spark?
I think so
ruaok
lets talk about the mapping next.
pristine___
And then I want to talk about some similar comments on the post
Cool
ruaok
I think I had a import realization yesterday.
important.
pristine___
what you are talking about is like giving users playlist of specific genre or something like that, no?
ruaok
pristine___: yes, exactly.
pristine___
I understand that!
I think
ruaok
it may be a different project, but its using the same back-end.
ok, to the mapping now.
pristine___
It should be a separate feature all together, like a seperate tab next to top artist, similar artist, I mean I don't it to be an alternative to top/similar artist
ruaok
yes, sure.
pristine___
The playlist looks not so good because of two reasons, one is mapping, the other is, ummm.... You remember how we train data?