in #metabrainz

1:02 AM
Nyanko-sensei joined the channel
1:05 AM
D4RK-PH0ENiX has quit
1:34 AM
Nyanko-sensei has quit
1:35 AM
D4RK-PH0ENiX joined the channel
1:51 AM
nawcom has quit
1:51 AM
nawcom joined the channel
5:19 AM
Darkloke joined the channel
5:47 AM
Darkloke has quit
6:23 AM
travis-ci joined the channel
6:23 AM
travis-ci

metabrainz/picard#4735 (master - ec776c8 : Laurent Monin): The build passed.
6:23 AM
Change view : https://github.com/metabrainz/picard/compare/4d...
6:23 AM
Build details : https://travis-ci.org/metabrainz/picard/builds/...
6:23 AM
travis-ci has left the channel
6:30 AM
Darkloke joined the channel
7:07 AM
reosarevok

yvanzo, bitmap: https://tickets.metabrainz.org/browse/MBS-5193 - should we wontfix as suggested by Ollie and Ian (and remove the doc section as suggested by jesus2099)?
7:07 AM
BrainzBot

MBS-5193: Regression : impossible to purposely set bad encoded alias (search hints)
9:47 AM
antlarr2 has quit
9:50 AM
antlarr joined the channel
11:03 AM
D4RK-PH0ENiX has quit
11:04 AM
aidanlw17

Freso: I'll be out right before the meeting but should be back on when it starts - I mailed in my review incase I'm late :)
11:08 AM
alastairp

aidanlw17: good morning
11:08 AM
I'm just heading off to lunch, but should be back in an hour or so
11:14 AM
D4RK-PH0ENiX joined the channel
11:14 AM
aidanlw17

alastairp: hi, sounds good, we can talk when you’re back?
11:14 AM
I made some new comments on the metrics PR
11:14 AM
alastairp

great. how's it going on the query optimisation?
11:15 AM
aidanlw17

I think that it’s good. We now only need one query to select the data, and one to insert it
11:15 AM
One select query per batch!
11:15 AM
alastairp

awesome!
11:15 AM
that's going to be so fast
11:15 AM
I'll test it when I get back then
11:16 AM
aidanlw17

~28 seconds to compute and insert one 10k recording batch on my machine
11:16 AM
alastairp

compared to how long before?
11:16 AM
aidanlw17

I’ll need to look back in my notes to report
11:17 AM
I’ll tell you when you’re back from lunch! Haha.
11:17 AM
alastairp

cool, talk soon
11:18 AM
D4RK-PH0ENiX has quit
11:24 AM
D4RK-PH0ENiX joined the channel
12:00 PM
yvanzo: thanks for all of the feedback on my tickets!
12:10 PM
ruaok has a slow start to the day
12:10 PM
ruaok

but I really needed that ride, even if it was super hot.
12:11 PM
alastairp

where did you go?
12:11 PM
ruaok

just up besos, nothing fancy. I wanted to go all weekend, but I ended up getting distracted by everything.
12:12 PM
alastairp

nice
12:12 PM
yeah, we've started riding after work at 8ish, to get a bit of coolness in the day
12:13 PM
almost any other time is impossible
12:14 PM
ruaok

Mr_Monkey: back yet?
12:15 PM
alastairp: yeah, 8pm would work, but there are too many other things going on then.
12:15 PM
alastairp

sure, you fit stuff in whenever you can
12:17 PM
Darkloke has quit
12:30 PM
TOPIC: MetaBrainz Community and Development channel | MusicBrainz non-development: #musicbrainz | New GSoC students start here: https://goo.gl/7jsjG2 | Channel is logged; see https://musicbrainz.org/doc/IRC for details | Meeting agenda: Reviews, MB Summit (ruaok)
12:32 PM
ruaok

pristine__: how are you doing?
12:33 PM
pristine__

Hey
12:33 PM
I am good. Sorry for being afk. Was travelling.
12:33 PM
And shifting the room.
12:33 PM
How are you?
12:34 PM
ruaok

good, just checking in to see if you need anything.
12:34 PM
I have a pile of metabrainz things to do today -- I might get around to doing some MSB stuff later.
12:34 PM
pristine__

I left some comments on #597
12:35 PM
LB-server. I will give you the link. A sec
12:36 PM
https://github.com/metabrainz/listenbrainz-serv...
12:36 PM
And https://github.com/metabrainz/listenbrainz-labs...
12:37 PM
Changes pushed ^
12:37 PM
Rotab has quit
12:41 PM
ruaok: what does Default now() means? If we don't provide a timestamp then current timestamp will be added, no?
12:41 PM
ruaok

correct.
12:41 PM
pristine__

Then why not null clause
12:42 PM
So that no one can push null value in the col?
12:42 PM
ruaok

yes
12:45 PM
pristine__

Okay. Thanks
12:46 PM
aidanlw17

alastairp: you could do 40 batches with the new query in the time it used to take to do only 1!
12:46 PM
alastairp

great, sounds good
12:46 PM
I'm just finishing up some reviews on another project and I'll take a look at this PR again
12:47 PM
so you also fixed the query parameters?
12:47 PM
aidanlw17

It took my machine ~19 minutes to do the old method for one batch.
12:47 PM
Yes I did fix them!
12:49 PM
alastairp

perfect, sounds good
12:49 PM
aidanlw17

Sort of related, Philip used arrays of NaN casted to double precision to represent rows with missing data. We decided for annoy to use vectors of the form [0, ..., 0] to represent those that didn't have a submission instead. For us that makes more sense, so I started inserting rows of 0 rather than NaN when there is missing data for a metric as well.
12:50 PM
alastairp

ok, cool. it makes sense that what we have in the database is exactly what we insert into annoy
12:50 PM
aidanlw17

I think so too. I found this interesting, if you add a vector to an Annoy index containing the value `None`, it converts that value to -1 when adding it to the index.
12:51 PM
alastairp

ah, that's very interesting too
12:51 PM
aidanlw17

We also have negative elements of our vectors though, so I still think it makes the most sense to use the value 0?
12:51 PM
alastairp

that was about to be my next question -
12:52 PM
what is the scale of our features? are they all normalised from 0-1?
12:53 PM
aidanlw17

Almost all values range from -1 to 1, but looking closely they are not all < 1. Some have magnitudes larger
12:53 PM
I took the transformation functions directly from Philip, I should look closer to see about that.
12:56 PM
alastairp

we have the NormalizedLowLevelMetric classes
12:56 PM
what does that normalise to?
12:59 PM
aidanlw17

Again my background on the transformation is weak, some of it I don't fully understand. For normalized lowlevel metrics, the values are: (value_from_lowlevel - mean_value)/std_dev
13:00 PM
Then if it is a weighteed normalized lowlevel metric, that value is multiplied afterwards by a weight factor `self.weight_vector = np.array([self.weight ** i for i in indices])`
13:01 PM
Where self.weight is currently set to 0.95.
13:01 PM
alastairp

ok, cool
13:01 PM
let
13:02 PM
let's leave it as-is for now, perhaps we can modify it in the future
13:02 PM
aidanlw17

Maybe easier if I reference it like this https://github.com/metabrainz/acousticbrainz-se... , sorry
13:02 PM
Okay sounds good.
13:02 PM
We'll be able to pay attention to those special cases in the evaluation as well
13:03 PM
alastairp

for the transforms, the methods now query the dictionary in python to get the values?
13:04 PM
BrainzGit

[listenbrainz-labs] mayhem merged pull request #36 (master…producer): Use a single writer script for recommendations and stats writer https://github.com/metabrainz/listenbrainz-labs...
13:04 PM
aidanlw17

Previously, we used a function get_data to extract the lowlevel data with a specific path or the highlevel models
13:04 PM
alastairp

a specific postgres query path, right?
13:05 PM
lowlevel.data->'blah'->'foo'
13:05 PM
aidanlw17

Yeah exactly. I wrote a new function get_feature_data, which takes that path and extracts it from the dictionary.
13:05 PM
Then passes the value to transform.
13:05 PM
alastairp

ahh, I see
13:06 PM
aidanlw17

I left the paths as is, because I thought soon we may be able to just use the select feature paths in the postgres query
13:06 PM
rather than getting the whole document
13:06 PM
yvanzo

alastairp: You’re welcome, musicbrainz-docker is currently sluggish until PR #106 can be updated/merged with a working SIR.
13:09 PM
alastairp

mm, right. I agree that leaving the path is a good idea, I'm not sure I would have done it this way. especially `features = self.path[7:-1]` makes me a bit worried
13:09 PM
yvanzo

There are two annoying bugs atm: sir reindex not always returning (which can be worked around by downloading prebuilt indexes) and sir reindex failing over some invalid characters (which is required to build indexes).
13:09 PM
ruaok

alastairp: https://github.com/metabrainz/listenbrainz-serv...
13:10 PM
I'd love your feedback on that one.
13:10 PM
alastairp

I would have written specific methods (or perhaps some lambdas?) that explicitly select the items from the dictionary
13:10 PM
ruaok

the point here is to store user specific output from the collaborative filtering system.
13:11 PM
aidanlw17

Yes I agree that felt a little sketchy... I'll see about rewriting that in another way.
13:11 PM
ruaok

and then to allow multiples recommender scripts to access these tables and keep a record of which script has used which tracks.
13:11 PM
alastairp

yvanzo: no problem. I was looking at upgrading our mirror to new schema, but perhaps I'll just wait for all of this to be finished. we only use the server/api and no search, so for us it's a matter of updating the image and running upgrade
13:11 PM
but I had some custom modifications to point to the external database server, so the fewer changes I have to make the better
13:12 PM
aidanlw17: cool. it's true that it might become a bit more complex - perhaps we'll have to write a custom transformer per method?
13:12 PM
otherwise - what about a list of dictionary keys? ['lowlevel', 'mfcc', 'mean']
13:12 PM
in fact, we could then construct the path from this anyway
13:13 PM
that way we can keep your method, but it won't involve messy string splitting
13:14 PM
ruaok: I'll have a look. while you're here, a good time to ask a question about pg schemas. it looks like you're splitting different parts of lb into separate schemas, which sounds like a great idea to me
13:14 PM
we're making some more tables for the similarity stuff. it feels like we could put this in a schema too
13:14 PM
ruaok

in AB?
13:14 PM
alastairp

yes
13:14 PM
ruaok

yea, please do.
13:15 PM
in the end the AB similarity data ought to be copied to the LB recommendation schema.
13:15 PM
the idea is to provide complete dumps of this schema for anyone willing to try writing a recommendation engine.
13:16 PM
and it should have collabortive filtered tracks, similarity tracks, artist-artist similarity.
13:21 PM
alastairp

aidanlw17: sorry, so this is one more thing on this pr :)
13:22 PM
let's put similarity tables in a schema. this is as easy as `create schema similarity` and prefix tables with the schema name when using them (`select x from similarity.similarity`)
13:22 PM
it will help us to logically separate all of the tables
13:24 PM
aidanlw17

alastairp: it makes sense to me to store the keys in a list like that, and I think we’ve already done something similar in AB-404.
13:24 PM
BrainzBot

AB-404: Provide an API endpoint where users can select only the features that they want returned https://tickets.metabrainz.org/browse/AB-404
13:24 PM
alastairp

great
13:24 PM
aidanlw17

And ok to putting the tables in a schema :) I’ll get on that. In the same PR?
13:25 PM
alastairp

yes please
13:25 PM
aidanlw17

Ok!
13:32 PM
yvanzo

reosarevok: no idea but it seems to be related to some top priority bugs: https://tickets.metabrainz.org/issues/?jql=proj...
13:33 PM
alastairp

aidanlw17: check out create_schema and drop_schema here: https://github.com/metabrainz/listenbrainz-serv...
13:34 PM
one nice thing you can do is `drop schema s cascade;` will drop all of the tables in the schema s, you don't have to individually drop them in drop_tables
13:34 PM
aidanlw17

Okay thanks alastairp
13:34 PM
Cool!! Sounds handy
13:39 PM
alastairp: are the other tables in AB related to data are part of a different schema already?
13:39 PM
alastairp

no, we have no other schemas except the default
13:40 PM
we should move some of them
13:44 PM
aidanlw17

Okay. I can do that after I do these then
13:44 PM
If you want!
13:44 PM
alastairp

that's a larger process, since we have to move existing data