#metabrainz

/

18:04 PM
ruaok

thats a really weird way of thinking about that.

2019-05-27 14758, 2019

18:04 PM
ruaok

for me, I'd prefer just a some parametric score of sorts.

2019-05-27 14709, 2019

18:05 PM
ruaok

now, what part of that do we need to work on, iliekcomputers ?

2019-05-27 14724, 2019

18:05 PM
yvanzo

iliekcomputers: great, no need to know about SIR, at least you are used to Python :)

2019-05-27 14739, 2019

18:05 PM
ruaok

but that score is literally the point of the collaborative filtering algorithm, no?

2019-05-27 14739, 2019

18:05 PM
iliekcomputers

i am not sure yet, but a better metric is probably used in reality. i'll have to take a look.

2019-05-27 14756, 2019

18:05 PM
ferbncode

Iliekcomputers: sure \o/

2019-05-27 14719, 2019

18:06 PM
ruaok

ok, I for now I'm just going to work with the idea that we have some score where higher is better.

2019-05-27 14740, 2019

18:06 PM
iliekcomputers

ruaok: I was thinking that making it predict numbers from 1 to 1000s is harder than making it predict some other metric of likeability.

2019-05-27 14742, 2019

18:06 PM
ruaok

which is not to say that we should make that into a playlist directly. I doubt that will turn out well.

2019-05-27 14758, 2019

18:06 PM
ruaok

ah, ok, now I understand.

2019-05-27 14720, 2019

18:07 PM
ruaok

ok, we clearly need to research what metrics are doable.

2019-05-27 14729, 2019

18:07 PM
ruaok

but this is where aidanlw17's work comes in.

2019-05-27 14711, 2019

18:08 PM
Slurpee joined the channel

2019-05-27 14711, 2019

18:08 PM
Slurpee has quit

2019-05-27 14711, 2019

18:08 PM
Slurpee joined the channel

2019-05-27 14713, 2019

18:08 PM
ruaok

if we say, pick the most played track of the last week, and then find CF recommended tracks that are similar, we can start constructing a playlist.

2019-05-27 14734, 2019

18:08 PM
ruaok

chaining along from track to track that is similar.

2019-05-27 14747, 2019

18:08 PM
pristine__

What other metric can we have apart from listen counts? The more I play a song, the more I like it.

2019-05-27 14701, 2019

18:09 PM
aidanlw17

ruaok what is CF?

2019-05-27 14708, 2019

18:09 PM
ruaok

pristine__: I am not sure. this is precisely what we need to learn.

2019-05-27 14710, 2019

18:09 PM
iliekcomputers

aidanlw17: collaborative filtering.

2019-05-27 14718, 2019

18:09 PM
aidanlw17

oh thank you!

2019-05-27 14722, 2019

18:09 PM
iliekcomputers

pristine__: everything will need to be based on listen counts.

2019-05-27 14738, 2019

18:09 PM
ruaok

and also, I want to reiterate the ONE GOAL I had that caused me to start MusicBrainz.

2019-05-27 14754, 2019

18:09 PM
ruaok

I wanted to pick a starting track and an ending track and give a duration.

2019-05-27 14710, 2019

18:10 PM
iliekcomputers

the thing is that predicting listen counts (which have a large range 1 to tens of thousands) is a harder problem than we need to solve probably.

2019-05-27 14716, 2019

18:10 PM
ruaok

start with enter sandman from metallica and end up with orinoco flow from enya in 2 hours. GO.

2019-05-27 14736, 2019

18:10 PM
Mr_Monkey

The longest i've listened to a track for probably also indicates my tastes, those that are cemented

2019-05-27 14742, 2019

18:10 PM
pristine__

Exciting

2019-05-27 14747, 2019

18:10 PM
ruaok

so, finding a line of similar tracks that go from one track/artist to another track/artist.

2019-05-27 14757, 2019

18:10 PM
pristine__

Mr_Monkey: yup

2019-05-27 14704, 2019

18:11 PM
ruaok

THIS, believe if it or not is why I started MusicBrainz. without MB, this is impossible.

2019-05-27 14737, 2019

18:11 PM
ruaok

iliekcomputers: what does the CF algorithm spit out currently as its ranking?

2019-05-27 14749, 2019

18:11 PM
ruaok

or is that a black box, based on the fact that we're shoving in listen counts?

2019-05-27 14755, 2019

18:11 PM
iliekcomputers

ruaok: we give it listen counts and it tries to predict listen counts as a result.

2019-05-27 14709, 2019

18:12 PM
ruaok

where is the problem in that?

2019-05-27 14715, 2019

18:12 PM
ruaok

is it not doing a good job?

2019-05-27 14732, 2019

18:12 PM
iliekcomputers

maybe i'm not able to explain my thoughts on this correctly.

2019-05-27 14744, 2019

18:12 PM
iliekcomputers

let me do some research and come back with a good paragraph or two.

2019-05-27 14754, 2019

18:12 PM
ruaok

ok, likely I am being dense too.

2019-05-27 14703, 2019

18:13 PM
pristine__

It is doing a good job, maybe iliekcomputers wants a diff metric

2019-05-27 14708, 2019

18:13 PM
pristine__

Diff from listen count.

2019-05-27 14718, 2019

18:13 PM
ruaok

but what you are saying is exactly what I've been understanding, so I am not understand the crux of the problem you're raising.

2019-05-27 14700, 2019

18:14 PM
ruaok

well, if it ain't broke, don't fix it.

2019-05-27 14706, 2019

18:14 PM
ruaok

perhaps it is suitable for the first round.

2019-05-27 14710, 2019

18:14 PM
iliekcomputers

hmm, yep.

2019-05-27 14736, 2019

18:14 PM
ruaok

In reality I think we're going to do this challenge in the autumn and then realize "oh crap, we need this data set, that data set, this, that".

2019-05-27 14742, 2019

18:14 PM
ruaok

learning is the key goal of the challenge.

2019-05-27 14747, 2019

18:14 PM
ruaok

and then we start the cycle again.

2019-05-27 14706, 2019

18:15 PM
ruaok

and perhaps at the end of the second cycle we'll have something to be proud of.

2019-05-27 14713, 2019

18:15 PM
ruaok is managing expectations

2019-05-27 14724, 2019

18:15 PM
ruaok

aidanlw17: any thoughts from you?

2019-05-27 14742, 2019

18:15 PM
ruaok

have you thought about how to extend your resultant data to artstsis?

2019-05-27 14745, 2019

18:15 PM
pristine__

"managing expectations"....awww

2019-05-27 14710, 2019

18:16 PM
ruaok

pristine__: yes, we're all working hard to get things done, but the reality is that the first pass is not going to be glorious.

2019-05-27 14727, 2019

18:16 PM
ruaok

if it teaches us how to do better, than I am 100% satisfied.

2019-05-27 14709, 2019

18:17 PM
pristine__

Well said. Learning is the key :)

2019-05-27 14714, 2019

18:17 PM
ruaok

ding.

2019-05-27 14751, 2019

18:17 PM
pristine__

Dong

2019-05-27 14754, 2019

18:17 PM
ruaok

iliekcomputers: the work we've done for shuffling user stats data back to hetzner.... can we use that to shove the recommendations from CF back to hetzner too?

2019-05-27 14758, 2019

18:17 PM
aidanlw17

I'd like to really review the files from pristine__ and the CF project as a whole to get a better understanding of this recommendation work. One thought of mine is that alastairp and I currently will be using 12 separate metrics for track-track similarity, then near the end of the summer a goal is to bring these together into one track-track metric for overall similarity. I think in the end, a combination of this metric and pristine__'s

2019-05-27 14758, 2019

18:17 PM
aidanlw17

results would give a good dataset for recommendation.

2019-05-27 14706, 2019

18:18 PM
iliekcomputers

ruaok: yes.

2019-05-27 14710, 2019

18:18 PM
iliekcomputers

shouldn't be much work.

2019-05-27 14714, 2019

18:18 PM
ruaok

that then begs the question: how to we handle new runs of the CF data?

2019-05-27 14727, 2019

18:18 PM
ruaok

do we keep X data sets and run a new one once a week?

2019-05-27 14744, 2019

18:18 PM
iliekcomputers

that is what i was expecting.

2019-05-27 14749, 2019

18:18 PM
ruaok

iliekcomputers: great. that will clearly be the next step for pristine__

2019-05-27 14751, 2019

18:18 PM
ruaok

iliekcomputers: <3

2019-05-27 14717, 2019

18:19 PM
pristine__

hetzner?

2019-05-27 14728, 2019

18:19 PM
ruaok

and then we can make playable lists on lb.org -- once we have that, then we're at a point when we can realistically see how the CF alg is performing.

2019-05-27 14741, 2019

18:19 PM
pristine__

Lb-server?

2019-05-27 14745, 2019

18:19 PM
ruaok

pristine__: yes.

2019-05-27 14754, 2019

18:19 PM
pristine__

Oh. Okay.

2019-05-27 14756, 2019

18:19 PM
iliekcomputers

hetzner == leader.listenbrainz

2019-05-27 14710, 2019

18:20 PM
pristine__

I like the next step 😆

2019-05-27 14715, 2019

18:20 PM
ruaok

and I guess there we ought to post process it into, recommendations of things that people have played and recommendations for things that are new to users.

2019-05-27 14729, 2019

18:20 PM
ruaok

iliekcomputers: actually in this case I mean hetzer = lemmy

2019-05-27 14743, 2019

18:20 PM
iliekcomputers

ooh

2019-05-27 14748, 2019

18:20 PM
iliekcomputers

ambiguous. :P

2019-05-27 14750, 2019

18:20 PM
aidanlw17

ruaok: In terms of artist-artist similarity, I think we need these two projects in combination - given that artists may also diverge greatly in the types of music they create, I don't anticipate that only track-track similarity would provide a strong recommendation artist-artist. When bringing in the listen counts from pristine__, I would be interested in seeing how artist-artist recommendation could change.

2019-05-27 14720, 2019

18:21 PM
ruaok

aidanlw17: yes, and I think part of our challenge might be to pick different better metrics that feed your algorithm.

2019-05-27 14751, 2019

18:21 PM
ruaok

perhaps we should make samples of track-track similarities available for public inspection asap too.

2019-05-27 14720, 2019

18:22 PM
ruaok

aidanlw17: I think that is spot on.

2019-05-27 14729, 2019

18:22 PM
aidanlw17

Yeah. alastairp and I also were planning to make a public evaluation available for track-track similarity as soon as we have a working pipeline

2019-05-27 14742, 2019

18:22 PM
ruaok

I'd like all of use to start thinking about how to accomplish the artist-artist data set from the LB and AB datasets.

2019-05-27 14750, 2019

18:22 PM
ruaok

aidanlw17: superb

2019-05-27 14712, 2019

18:23 PM
ruaok

ok, I think we all have a better understanding of next steps and more of the roadmap now, yes?

2019-05-27 14722, 2019

18:23 PM
ruaok

if something is unclear, ask now.

2019-05-27 14725, 2019

18:23 PM
pristine__

Yes yes.

2019-05-27 14740, 2019

18:23 PM
alastairp

iliekcomputers: thanks for starting the script. how's it going?

2019-05-27 14745, 2019

18:23 PM
ruaok

iliekcomputers: I'd live to hear more about your reservations about the metric/ranking for CF when you come by them.

2019-05-27 14755, 2019

18:23 PM
ruaok waves at alastairp

2019-05-27 14709, 2019

18:24 PM
alastairp

hi. I'm just reading backlog, and cooking too

2019-05-27 14710, 2019

18:24 PM
aidanlw17

I'll keep that in mind. Additionally, if you guys produce a metric from the collaborative filtering it might be possible to index that with annoy as we will do with the other metrics for track-track. Is that something you want to consider?

2019-05-27 14713, 2019

18:24 PM
iliekcomputers

ruaok: let me try to rephrase what i was saying.

2019-05-27 14743, 2019

18:24 PM
ruaok

aidanlw17: that does sound interesting yes.

2019-05-27 14751, 2019

18:24 PM
iliekcomputers

right now, we're trying to predict exactly how many times you would / should have listened to a particular song (say the strokes' last nite)

2019-05-27 14704, 2019

18:25 PM
iliekcomputers

this value can range from one to tens of thousands.

2019-05-27 14708, 2019

18:25 PM
ruaok

I am super eager to learn from comes from your project. pristine__ has done an excellent job doing that for me on the CF front.

2019-05-27 14712, 2019

18:25 PM
iliekcomputers

so it is hard to predict.

2019-05-27 14726, 2019

18:25 PM
ruaok

too granular?

2019-05-27 14727, 2019

18:25 PM
iliekcomputers

when in reality, we probably do not need that number to that degree of accuracy.

2019-05-27 14735, 2019

18:25 PM
pristine__

ruaok: thanks. Means a lot :)

2019-05-27 14739, 2019

18:25 PM
ruaok

:)

2019-05-27 14752, 2019

18:25 PM
iliekcomputers

a lesser range would probably work out as well (intuition, not sure)

2019-05-27 14759, 2019

18:25 PM
ruaok

iliekcomputers: and the scale of the CF ranking? is that linear or non-linear?

2019-05-27 14716, 2019

18:26 PM
aidanlw17

ruaok: I appreciate the excitement - I feel it too.

2019-05-27 14728, 2019

18:26 PM
ruaok

well, mapping the giant range into something smaller is easy.

2019-05-27 14739, 2019

18:26 PM
ruaok

premature quantization might become a problem.

2019-05-27 14742, 2019

18:26 PM
pristine__

Yes. We can probably normalize.

2019-05-27 14759, 2019

18:26 PM
ruaok

normalizing makes sense to me. quantizing gives me hesitation.

2019-05-27 14721, 2019

18:27 PM
iliekcomputers

alastairp: https://www.irccloud.com/pastebin/AGzOHdmi/

2019-05-27 14740, 2019

18:27 PM
alastairp

cool! that's really fast

2019-05-27 14754, 2019

18:27 PM
ruaok

I see how quantizing the data might be useful for other algs down the line, but for starters we may not want to do that.

2019-05-27 14709, 2019

18:28 PM
ruaok

alastairp, iliekcomputers : what script is that?

2019-05-27 14713, 2019

18:28 PM
alastairp

I'm not surprised... the original method took about 10 minutes for me to do it on a slow machine with only 4m tracks, but that blocked the whole table. this one is better

2019-05-27 14724, 2019

18:28 PM
alastairp

ruaok: writing submission offsets to the ll table

2019-05-27 14733, 2019

18:28 PM
ruaok

ah, yes.

2019-05-27 14745, 2019

18:28 PM
alastairp

tomorrow we can deploy write offset on submit

2019-05-27 14748, 2019

18:28 PM
ruaok

are submission offfsets monotonically increasing numbers?

2019-05-27 14752, 2019

18:28 PM
alastairp

yes

2019-05-27 14707, 2019

18:29 PM
pristine__

I guess we should continue with the road map and pick on normalization sometime later.

2019-05-27 14708, 2019

18:29 PM
ruaok

makes sense.

2019-05-27 14715, 2019

18:29 PM
ruaok

pristine__: yes.

2019-05-27 14722, 2019

18:29 PM
alastairp

it's the same as we're currently using in the GET endpoint

2019-05-27 14731, 2019

18:29 PM
ruaok

once we see the scores in the report (soon, I hope!) we can get our heads around this more.

2019-05-27 14737, 2019

18:29 PM
alastairp

uuid/low-level?n=[offset]

2019-05-27 14742, 2019

18:29 PM
iliekcomputers

hmm.

2019-05-27 14747, 2019

18:29 PM
pristine__

By tomorrow ruaok :)

2019-05-27 14750, 2019

18:29 PM
iliekcomputers

we should start merging PRs soon too.

2019-05-27 14751, 2019

18:29 PM
ruaok

wooo

2019-05-27 14758, 2019

18:29 PM
alastairp

iliekcomputers: when are you next available?

2019-05-27 14705, 2019

18:30 PM
pristine__

iliekcomputers: could you look at 21

2019-05-27 14707, 2019

18:30 PM
ruaok

the stats PRs should be merged asap, IMHO.

2019-05-27 14709, 2019

18:30 PM
iliekcomputers

alastairp: tomorrow works for me.

2019-05-27 14711, 2019

18:30 PM
pristine__

Can*

2019-05-27 14720, 2019

18:30 PM
pristine__

PR#21

2019-05-27 14722, 2019

18:30 PM
ruaok

my goal for today is to look at pristine's latest PR

2019-05-27 14737, 2019

18:30 PM
ruaok

(aside from boring nonprofit work)

2019-05-27 14748, 2019

18:30 PM
pristine__

I will send you link, ruaok

2019-05-27 14702, 2019

18:31 PM
alastairp

ok, good. perhaps then we can do the next PR on this offset stuff (if we do it early in the morning perhaps we can do the last part in the evening)

2019-05-27 14712, 2019

18:31 PM
ruaok

#26 is on my list.

2019-05-27 14716, 2019

18:31 PM
alastairp

and also we could take a look at the docker stuff that you were finishing up

2019-05-27 14708, 2019

18:32 PM
pristine__ telling her laptop to wake up.

2019-05-27 14721, 2019

18:33 PM
iliekcomputers

alastairp: ok.

2019-05-27 14726, 2019

18:33 PM
AfroThundr|main has quit

2019-05-27 14740, 2019

18:33 PM
iliekcomputers

ruaok: do we wanna talk some about azure?

2019-05-27 14748, 2019

18:33 PM
ruaok

sure.