in #metabrainz

9:43 AM
alastairp

what I've not seen in this discussion is an evaluation about what a good result is
9:43 AM
ruaok

but I think exposing anything is premature. we will have a lot of iterations.
9:43 AM
but that can be in the cards for the future.
9:43 AM
alastairp

regardless of what the process or end output is
9:43 AM
ruaok

alastairp: a very good question. I've not thought about that yet.
9:44 AM
my focus so far has been to build the data sets that allow people to make recomendations.
9:44 AM
though, I think making a new DB and then allowing people to download dumps of it for our challenge in the fall makes a lot of sense.
9:44 AM
alastairp: does the industry have a metric for measuring the performance of rec systems?
9:45 AM
alastairp

from my point of view, that's really important. since pristine__'s work has "something" working, but I've not seen any structured analysis as to whether the results are actually good
9:45 AM
Nyanko-sensei joined the channel
9:45 AM
other than ruaok saying "well, that _does_ look like something that I'd want to listen to"
9:45 AM
right, there are 2 broad options
9:45 AM
ruaok

correct, I agree.
9:46 AM
alastairp

playlist recommendation (e.g. https://recsys-challenge.spotify.com/) evaluates you by withholding part of the playlist, and seeing how many of the items that you recommend are on the withheld part
9:46 AM
ruaok

and really the CF stuff doesn't generate things I want to listen to. CF needs more backup/mashup.
9:46 AM
alastairp

otherwise you have subjective analysis. give it to someone and ask them how good it is
9:47 AM
the first one is much easier to evaluate, but you end up basically only recommending people stuff that they already know
9:47 AM
because there's no other way of knowing that a recommendation out of their known songs is good for them
9:47 AM
D4RK-PH0ENiX has quit
9:47 AM
so, alternatively, do similar to what gentlecat and philip did for their masters projects, generate a playlist, give it to someone, and ask them to thumbs up/down recommendations
9:48 AM
ruaok

I really only see the latter as being possible. since we don't have 1M playlists to begin with.
9:48 AM
alastairp

(then you have to work out how to fold that feedback into the algorithm too)
9:48 AM
you don't have playlists, but you have playback history
9:48 AM
Nyanko-sensei has quit
9:49 AM
D4RK-PH0ENiX joined the channel
9:49 AM
ruaok

the CF alg will need to have a candidate dataset to recommend into.
9:50 AM
which we haven't quite sorted out to do create yet, but have some ideas.
9:50 AM
but that obviously impacts what gets generated. and may limit the effectiveness of using listens as a way of measuring effectiveness.
9:50 AM
alastairp

so you want to build a test playlist? that's not a terrible idea
9:51 AM
but man, it's going to be so subjective
9:51 AM
ruaok

it will be for sure.
9:51 AM
but I think that anything else is beyond the scope for the summer.
9:51 AM
alastairp

ruaok: btw, bulk queries _do_ get slower, but it seems to be the transfer time for larger and larger responses rather than the actual db lookup
9:51 AM
so your nginx suggestion is good
9:52 AM
sure, not much time left in the summer for that
9:52 AM
ruaok

if we get a page on LB where a user can click "gimme a playlist" and one appears in a reasonable amount of time, I would be happy for the summit.
9:52 AM
summer.
9:52 AM
alastairp: great.
9:52 AM
given how we're evolving all of this, this needs to be part of the roadmap for a challenge in the autumn.
9:53 AM
but for summer, it is too much.
9:55 AM
thanks for putting that on the radar, alastairp.
9:55 AM
iliekcomputers: pristine__: thoughts on this discussion?
9:58 AM
iliekcomputers

not so much, evaluation is definitely something we need to work on soon.
9:59 AM
i'd been thinking of how we could get user feedback (thumbs up/down) into the cf algorithm. i guess it'd involve adding/subtracting values into the listen counts passed into the cf algorithm.
9:59 AM
ruaok

not sure if feeding back into CF is all that good to start with.
9:59 AM
feeding back into the rec alg itself might be better or easier to start with.
10:00 AM
or adjusting the candidate set.
10:01 AM
iliekcomputers

hmm, yeah.
10:01 AM
but no way of knowing that with no real evaluation so far. getting some recommendations into production with thumbs up / down should be priority for now, i guess.
10:02 AM
ruaok

I also feel that if we get to the point where "I can't tell how much this decent recommendation is improving over time" then I'll be quite happy.
10:03 AM
which of course means that we need to have a more qualitative approach to evaluating recommendations.
10:03 AM
reosarevok

You mean giving them to someone with better quality taste than ruaok? Ok, me and zas are available :p
10:03 AM
iliekcomputers

to be honest, we can't tell that right now either, really.
10:04 AM
ruaok

both of you are right.
10:04 AM
but I haven't seen anything that made me smile yet.
10:04 AM
only things that I am convinced that I don't want to listen to.
10:04 AM
reosarevok pats ruaok on the head
10:05 AM
of course, we're also still early in the game.
10:05 AM
reosarevok

Very much so
10:06 AM
Qualitative evaluation is going to be very hard anyway, because it depends on having a lot of people with different tastes say "this, this is good shit"
10:06 AM
ruaok

I guess if we can't please ourselves on a very basic level, then a more quantitative solution will only confirm what we already know.
10:06 AM
reosarevok

We barely have a lot of people *submitting* yet :)
10:06 AM
ruaok

(read: we suck)
10:06 AM
yeah, that is another issue that I am grappling with.
10:06 AM
reosarevok

Wait
10:06 AM
"which of course means that we need to have a more qualitative approach to evaluating recommendations."
10:06 AM
Did you mean quantitative?
10:06 AM
ruaok

we keep releasing stuff and focusing on the next thing, but we need to work to get more users.
10:07 AM
qualitative, I guess.
10:07 AM
reosarevok

Oh, ok
10:07 AM
ruaok

My brain is barely cohesive this morning. feh. jetlag gets worse as one ages.
10:08 AM
reosarevok

If you can come up with some half-decent quantitative / programmatical way of knowing if stuff is kinda-sorta improving, that would be great, if only because for a human is hard to tell I feel
10:08 AM
"Ok, I still hate this shit, but do I hate it LESS?"
10:08 AM
ruaok

no arguments from me.
10:08 AM
still, I'm happy we're facing these issues/questions.
10:08 AM
clearly a sign of progress.
10:09 AM
reosarevok

"Just how shit are we really?" "PROGRESS!"
10:09 AM
:D
10:09 AM
iliekcomputers

did we come to a conclusion about storing the data?
10:09 AM
reosarevok

But yeah, I guess :)
10:10 AM
ruaok

iliekcomputers: no
10:10 AM
iliekcomputers

😂
10:10 AM
reosarevok

iliekcomputers: you're a playground bully :p
10:10 AM
You guys have more money than everyone else combined!
10:10 AM
ferbncode

iliekcomputers: 😂
10:11 AM
reosarevok

You'll still manage to lose to Pakistan somehow anyway, though, so it's ok
10:12 AM
alastairp

iliekcomputers: I want to add a constant from somewhere in the code into a sphinx documentation so that it shows up in the api documentation. ever done that?
10:13 AM
I guess LB does https://listenbrainz.readthedocs.io/en/producti...
10:16 AM
pristine__

ruaok: hey. Sorry, I am a lil late, didn't know the time of the meeting. Phew.
10:16 AM
iliekcomputers

alastairp: i didn't write it but yeah
10:17 AM
reosarevok: https://github.com/metabrainz/listenbrainz-serv...
10:17 AM
oh sorry, alastairp ^
10:17 AM
reosarevok: never lost to pakistan in a world cup :D
10:18 AM
pac23

but 29 hour response time is just apphaling
10:18 AM
iliekcomputers when is the match ?
10:18 AM
alastairp

iliekcomputers: cool, it's possible that one of these auto* methods can include the number directly into the docstring, I'll have a look
10:20 AM
BrainzGit

[musicbrainz-server] reosarevok merged pull request #1026 (master…MBS-10133): MBS-10133: Clarify "empty query" bad request error https://github.com/metabrainz/musicbrainz-serve...
10:20 AM
BrainzBot

MBS-10133: Error message when sending an empty query to the WS is unclear https://tickets.metabrainz.org/browse/MBS-10133
10:20 AM
iliekcomputers

ruaok: if we put everything in a different database, it'll be harder to access from LB, no joins etc.
10:20 AM
pac23: it is going on, SA 34/2 in 10 overs :D
10:22 AM
alastairp

not many good umpire emojis
10:22 AM
\o/
10:22 AM
\o
10:22 AM
_o
10:22 AM
iliekcomputers

alastairp: nz demolished sri lanka a few days ago
10:22 AM
🎉
10:23 AM
alastairp: are you putting the ratelimit values inside the docs from sphinx?
10:24 AM
alastairp

I'm looking at the number of items per bulk query
10:24 AM
ratelimit values would be nice, but those will come from consul now?
10:24 AM
iliekcomputers

ah.
10:25 AM
alastairp

and so won't be available when docs are built
10:27 AM
iliekcomputers

yeah, consul was the problem when i thought of putting it in there yesterday
10:27 AM
alastairp

can we set a default, and override it with config if set?
10:28 AM
iliekcomputers

that is what i did for now
10:29 AM
https://github.com/metabrainz/acousticbrainz-se...
10:31 AM
alastairp

right, but that would set the limits to the BU defaults, I'm not sure if we want a specific AB defaults too
10:43 AM
BrainzGit

[musicbrainz-server] reosarevok merged pull request #1034 (master…MBS-8915): MBS-8915: Allow editors to choose delimiter in track parser https://github.com/metabrainz/musicbrainz-serve...
10:43 AM
BrainzBot

MBS-8915: Allow editors to choose delimiter in track parser https://tickets.metabrainz.org/browse/MBS-8915
10:47 AM
D4RK-PH0ENiX has quit
11:02 AM
D4RK-PH0ENiX joined the channel
11:06 AM
D4RK-PH0ENiX has quit
11:07 AM
ruaok

> if we put everything in a different database, it'll be harder to access from LB, no joins etc.
11:07 AM
iliekcomputers: yes, exactly, but then again, what data exists in LB that needs to be joined?
11:08 AM
the key data really lives in Influx.
11:09 AM
iliekcomputers

Select track from user join cf_recommendation on user.id
11:09 AM
To get user recommendations for a bunch of users
11:10 AM
ruaok

at the same time that limits recommendations to people who have LB accounts.
11:10 AM
not sure if that is a relevant point.
11:10 AM
I *think* adding a schema into the LB data is the right course of action for now.
11:12 AM
iliekcomputers

That sounds like a reasonable compromise to me for now.
11:12 AM
ruaok

what do we call it?
11:12 AM
recommendation? recsys? (which is what the industry calls all this. not a fan, really).
11:14 AM
iliekcomputers

Recommendation
11:14 AM
Mr_Monkey

The WhyNot? Machine
11:14 AM
ruaok

recommendation.{track_track_relations|artist_artist_relations|cf_user_recommendation} ?
11:15 AM
actually singhular on the first two.
11:15 AM
ruaok can't spel
11:16 AM
CatQuest

that's okaye
11:17 AM
ruaok

not a very catty comment from you, CatQuest...
11:17 AM
hmmm. $17k invoices to send. delicious.
11:17 AM
CatQuest

anyway. ruaok I wanted to ask you a slight off topic question. how hard is it really to register and own and maintain a *.cat domain (seeingas you live in barceloan now i thoguht you woudl know, don't you also have a *.cat websie now?)
11:18 AM
ruaok

easy in the grand scheme of things.
11:18 AM
there is one caveat -- there needs to be some catalan content on the page.
11:18 AM
CatQuest

exactly
11:18 AM
but liek, how strict are they?
11:18 AM
ruaok

my mayhem.cat page has no text, except for "Benvinguts". so, welcome in Catalan. No one has ever come complaining.
11:19 AM
CatQuest

if I translated reosarevok's "nokkloom" page into estonian, wil lit suffice
11:19 AM
ruaok

not sure, really.
11:19 AM
CatQuest

hmmmmmm
11:19 AM
ruaok

like I said, I have almost no text on my site.