in #metabrainz

0:55 AM
tn5421 joined the channel
1:21 AM
supersandro2000 has quit
1:22 AM
supersandro2000 joined the channel
2:42 AM
ishaanshah

Morning!
3:00 AM
rdswift_ joined the channel
3:01 AM
rdswift has quit
3:02 AM
rdswift_ is now known as rdswift
4:04 AM
thomasross has quit
4:19 AM
BrainzGit

[musicbrainz-server] reosarevok opened pull request #1706 (master…eslint-max-len): Eslint fixes: max-len https://github.com/metabrainz/musicbrainz-serve...
4:21 AM
leonardo has quit
4:49 AM
leonardo joined the channel
4:56 AM
diru1100

Morning 🌄🌞
5:01 AM
shivam-kapila

morning
5:03 AM
d4rkie joined the channel
5:04 AM
Nyanko-sensei has quit
5:18 AM
supersandro2000 has quit
5:19 AM
supersandro2000 joined the channel
5:52 AM
BrainzGit

[listenbrainz-server] shivam-kapila opened pull request #1105 (master…patch-1): Fix a typo in recommendation docs https://github.com/metabrainz/listenbrainz-serv...
6:01 AM
[critiquebrainz] amCap1712 opened pull request #303 (master…recording-entity-support): CB-270: Implemented the reviewal of recordings https://github.com/metabrainz/critiquebrainz/pu...
6:01 AM
BrainzBot

CB-270: Add support for reviewing (or atleast rating) more entities https://tickets.metabrainz.org/browse/CB-270
6:23 AM
BrainzGit

[listenbrainz-server] vansika merged pull request #1105 (master…patch-1): Fix a typo in recommendation docs https://github.com/metabrainz/listenbrainz-serv...
7:19 AM
d4rkie has quit
7:19 AM
Nyanko-sensei joined the channel
7:48 AM
_lucifer

yvanzo: do you know who had written the moxy adapters for the search server ? I had some questions regarding it.
7:51 AM
I was thinking of changing the current adpater for status and primary type to the way i implemented it for packaging. That would fix SEARCH-608 and SEARCH-611 and other some inconsistent json issues. But I wanted to know if this could have some other side effects I am unaware of.
7:51 AM
BrainzBot

SEARCH-608: Inconsistent JSON serialization (lookup vs search): track count on CD stubs https://tickets.metabrainz.org/browse/SEARCH-608
7:51 AM
SEARCH-611: Incorrect content in JSON version of release group search result https://tickets.metabrainz.org/browse/SEARCH-611
8:06 AM
yvanzo

_lucifer: no, git blame?
8:09 AM
_lucifer: about side effects: yes, it can possibly break MusicBrainz Server and API clients.
8:10 AM
BrainzGit

[listenbrainz-server] vansika opened pull request #1106 (master…community=post-link): add community post link to recommendation info page https://github.com/metabrainz/listenbrainz-serv...
8:13 AM
yvanzo

_lucifer: for example of compatibility issue with MBS, see gh:MBS#1231
8:13 AM
BrainzBot

MBS-10421: Update schema_fixup of ReleasePackaging: https://github.com/metabrainz/musicbrainz-serve...
8:15 AM
yvanzo

_lucifer: By the way, your packaging patch is scheduled to be deployed on October 19th, see https://blog.metabrainz.org/2020/09/21/musicbra...
8:26 AM
_lucifer

yvanzo: yeah git blame forgot. its Mineo.
8:27 AM
Mineo: ping
8:27 AM
yvanzo: regarding the update, nice :)
8:27 AM
are similiar changes in place for the tickets i referred above?
8:28 AM
because i can probably fix the search server side issues but i am not familiar with the musicbrainz server
8:32 AM
d4rkie joined the channel
8:34 AM
Nyanko-sensei has quit
8:37 AM
ruaok

pristine___: I spent quite a lot of time thinking about our next steps. do you have a minute?
8:42 AM
pristine___

ruaok: hey
8:43 AM
ruaok

first point is kinda easy: candidate sets are nothing more than lists of recordings right? do they need anything else before we run them through the model?
8:44 AM
pristine___

Not really.
8:45 AM
> first point is kinda easy: candidate sets are nothing more than lists of recordings right?
8:45 AM
Yes
8:45 AM
>do they need anything else before we run them through the model?
8:45 AM
No, not really :)
8:45 AM
ruaok

ok, because in the community post we're a bit too focused on the two canidates sets (top, similar). we should open it up and let others make candidate sets.
8:45 AM
pristine___

Right.
8:45 AM
ruaok

how much work would it be for you to read a candidate set from an API call?
8:46 AM
because we could... take troi and slap an API on it.
8:46 AM
pristine___

(But the recordings in the candidate set should be in training data, otherwise model will discard them)
8:46 AM
ruaok

and then a troy patch becomes a candidate set.
8:46 AM
> (But the recordings in the candidate set should be in training data, otherwise model will discard them)
8:47 AM
pristine___

> how much work would it be for you to read a candidate set from an API call?
8:47 AM
ruaok

ah! that is the critical step I needed to know.
8:47 AM
pristine___

Like a list of [user id/name, recording_id]
8:47 AM
Is that what you mean?
8:47 AM
ruaok

no, simpler.
8:48 AM
Zastai joined the channel
8:48 AM
v6lur_ joined the channel
8:48 AM
I want to make "all recordings in MB tagged with 'punk' " into a candidate set.
8:49 AM
and then feed it into CF, so we can make a "recommend me the tracks I would like out of all tracks tagged with 'punk'" alg
8:49 AM
Zastai

heads up: someone has attached a .mp3 file to https://tickets.metabrainz.org/browse/LB-383 for no discernable reason, and I don't have the privileges required to remove it
8:49 AM
BrainzBot

LB-383: Allow updating usernames when they're changed in MusicBrainz
8:49 AM
pristine___

Oh shit. I read something else. I have to read the Candidate sets, cool. I will need the recordings and the user they are associated with.
8:49 AM
> and then feed it into CF, so we can make a "recommend me the tracks I would like out of all tracks tagged with 'punk'" alg
8:49 AM
Sounds wow!
8:50 AM
ruaok

exactly. how can we make that work?
8:51 AM
Zastai

the user that did it seems to have done other bogus stuff too (like creating a "Boogie" instrument request and then adding it existing Epics)
8:52 AM
_lucifer

that sounds more like content based filtering instead of collaborative filtering
8:52 AM
pristine___

Umm... But there should be some criteria for these Candidate sets, then we can write an api using troi. For example, user x listened to lot of sufi genre, so make a candidate set of all sufi tracks in MB
8:52 AM
Something like this?
8:52 AM
ruaok

Zastai: deleted.
8:52 AM
pristine___

Maybe
8:53 AM
ruaok

pristine___: let me read the docs you wrote on the CF spark stuff and refresh my mind
8:53 AM
pristine___

Okay. Ping me.
8:55 AM
yvanzo

_lucifer: I don’t think similar changes to MBS have been made for other tickets, but I can handle them.
8:56 AM
_lucifer

yvanzo: great!, I'll make the changes and submit a PR soon and try to have a dicussion with Mineo on the same if possible
8:57 AM
yvanzo: on a side note, are there any long term plans to move the server from perl to some other language?
8:58 AM
Zastai

ruaok: there seems to be no option in Jira to report a user for bad behaviour; might be useful for cases like this
9:00 AM
Gore joined the channel
9:01 AM
yvanzo

_lucifer: long long term, too long term to make it a solid plan.
9:01 AM
ruaok

pristine___: I think I will try and make a data flow graph of our system soon. a high level view is really needed for someone to come up to speed.
9:01 AM
_lucifer

oh ok, 👍
9:03 AM
ruaok

pristine___: at the most fundamental level a candidate set is: (recording, listen_count, user) yes?
9:05 AM
Gazooo794 has quit
9:06 AM
Gazooo794 joined the channel
9:07 AM
pristine___

ruaok: we need (recording, listen_count, user) to train a model, a candidate set is just (recording, user), the model then assigns rating/score to this tuple i.e (recording, user)
9:09 AM
Mineo

_lucifer: it's easier to answer your questions if you not only ping me, but also ask the questions :-) I might not be able to answer them immediately, but I'll try to answer them when I find the time
9:10 AM
ruaok

> we need (recording, listen_count, user) to train a model, a candidate set is just (recording, user), the model then assigns rating/score to this tuple i.e (recording, user)
9:10 AM
_lucifer

Mineo: sure, will keep it in mind :). I wanted to know if we could something like https://github.com/metabrainz/mb-solr/pull/37/f... instead of the Status adapter
9:10 AM
ruaok

this comment right here needs to be at the very top level of the docs someplace. this one encapsulates everything we need.
9:11 AM
but there is something missing, no?
9:11 AM
pristine___

?
9:12 AM
ruaok

all tracks in the candidate set must be in the data set that trains the model, no?
9:12 AM
pristine___

Yeah
9:13 AM
ruaok

ok, can you re-write the comment to include this restriction?
9:13 AM
pristine___

Which comment?
9:13 AM
ruaok

> we need (recording, listen_count, user) to train a model, a candidate set is just (recording, user), the model then assigns rating/score to this tuple i.e (recording, user)
9:14 AM
Mineo

_lucifer: by status adapter, do you mean https://github.com/metabrainz/mb-solr/blob/mast... ? if so, the answer's a solid maybe! I don't know anything about that class
9:15 AM
pristine___

We need (recording, listen_count, user) to train a model, a candidate set is just (recording, user), the model then assigns rating/score to this tuple i.e (recording, user). Note that all tracks and users in the candidate set must be in the training set otherwise they will be discarded by the model.
9:15 AM
_lucifer

yeah Mineo i mean that
9:16 AM
pristine___

ruaok: ^
9:16 AM
_lucifer

Mineo: sorry, my bad did git blame on the wrong file :(, I did it on https://github.com/metabrainz/mb-solr/blame/mas... .
9:17 AM
ruaok

pristine___: ok, thanks. that needs to go into the docs someplace.
9:18 AM
Mineo

_lucifer: no problem at all :)
9:19 AM
_lucifer

Mineo: but otherwise do you have any suggestions how to approach this ? My main issue with doing it using a adapter was how to add two fields
9:19 AM
ruaok

pristine___: so we could build this: Given a giant list of recordings that are tagged with "punk", we can iterate over all the listens in LB and discard recordings not tagged with punk. we then build a training set and a candidate set from this set of punk recordings. and then we have a punk music collaborative filter.
9:19 AM
is that right?
9:20 AM
pristine___

Yeah, sgtm.
9:21 AM
ruaok

because we can crowd source these kinds of data sets pretty easily.
9:21 AM
we just need to build this extra filtering step.
9:22 AM
pristine___

Hmm
9:23 AM
> because we can crowd source these kinds of data sets pretty easily.
9:23 AM
Mineo

_lucifer: unfortunately not. you'll need to read the eclipselink moxy documentation for that
9:24 AM
pristine___

ruaok: this is kinda a different project from *tracks you might like*?
9:24 AM
_lucifer

Mineo: ok np. one last question, was there any particular reason for using jaxb over something jackson?
9:24 AM
*like jackson
9:25 AM
ruaok

pristine___: its more of the same, no?
9:25 AM
right now I feel that the candidates sets that we are creating are not allowing for enough diversity.
9:25 AM
Mineo

_lucifer: other than "I could copy the adapters from https://github.com/metabrainz/search-server/tre...;, no
9:26 AM
(and the oxml.xml from https://github.com/metabrainz/search-server/blo...)
9:26 AM
_lucifer

ah, ok :D thanks for the help
9:26 AM
pristine___

ruaok: that's because we don't have enough data/recordings in LB for top/similar artists
9:26 AM
For example
9:27 AM
ruaok

exactly. the weakness is the similar/top artists data, not the CF or listens
9:27 AM
we should work to improve those, for sure, but at the same time being able to easily create more models from crowd sourced data sets would be very very useful.
9:28 AM
could this filtering of listens be done easily in spark?
9:28 AM
pristine___

But we have to have some artists or some metric to choose the recordings, that was the idea behind candidate set back in 2019.
9:28 AM
> we should work to improve those, for sure, but at the same time being able to easily create more models from crowd sourced data sets would be very very useful.
9:28 AM
Right, that's why is said, they are kinda two different project/things
9:29 AM
Improved mapping and a big data dump of listens will really improve the playlists
9:29 AM
> could this filtering of listens be done be easily in spark?
9:29 AM
I think so
9:29 AM
ruaok

lets talk about the mapping next.
9:29 AM
pristine___

And then I want to talk about some similar comments on the post
9:29 AM
Cool
9:31 AM
ruaok

I think I had a import realization yesterday.
9:31 AM
important.
9:31 AM
pristine___

what you are talking about is like giving users playlist of specific genre or something like that, no?
9:31 AM
ruaok

pristine___: yes, exactly.
9:32 AM
pristine___

I understand that!
9:32 AM
I think
9:32 AM
ruaok

it may be a different project, but its using the same back-end.
9:32 AM
ok, to the mapping now.
9:33 AM
pristine___

It should be a separate feature all together, like a seperate tab next to top artist, similar artist, I mean I don't it to be an alternative to top/similar artist
9:33 AM
ruaok

yes, sure.
9:33 AM
pristine___

The playlist looks not so good because of two reasons, one is mapping, the other is, ummm.... You remember how we train data?