MBS-5193: Regression : impossible to purposely set bad encoded alias (search hints)
2019-07-22 20313, 2019
antlarr2 has quit
2019-07-22 20336, 2019
antlarr joined the channel
2019-07-22 20332, 2019
D4RK-PH0ENiX has quit
2019-07-22 20348, 2019
aidanlw17
Freso: I'll be out right before the meeting but should be back on when it starts - I mailed in my review incase I'm late :)
2019-07-22 20328, 2019
alastairp
aidanlw17: good morning
2019-07-22 20339, 2019
alastairp
I'm just heading off to lunch, but should be back in an hour or so
2019-07-22 20324, 2019
D4RK-PH0ENiX joined the channel
2019-07-22 20339, 2019
aidanlw17
alastairp: hi, sounds good, we can talk when you’re back?
2019-07-22 20352, 2019
aidanlw17
I made some new comments on the metrics PR
2019-07-22 20356, 2019
alastairp
great. how's it going on the query optimisation?
2019-07-22 20330, 2019
aidanlw17
I think that it’s good. We now only need one query to select the data, and one to insert it
2019-07-22 20339, 2019
aidanlw17
One select query per batch!
2019-07-22 20343, 2019
alastairp
awesome!
2019-07-22 20346, 2019
alastairp
that's going to be so fast
2019-07-22 20350, 2019
alastairp
I'll test it when I get back then
2019-07-22 20304, 2019
aidanlw17
~28 seconds to compute and insert one 10k recording batch on my machine
2019-07-22 20320, 2019
alastairp
compared to how long before?
2019-07-22 20352, 2019
aidanlw17
I’ll need to look back in my notes to report
2019-07-22 20305, 2019
aidanlw17
I’ll tell you when you’re back from lunch! Haha.
2019-07-22 20310, 2019
alastairp
cool, talk soon
2019-07-22 20334, 2019
D4RK-PH0ENiX has quit
2019-07-22 20300, 2019
D4RK-PH0ENiX joined the channel
2019-07-22 20343, 2019
alastairp
yvanzo: thanks for all of the feedback on my tickets!
2019-07-22 20301, 2019
ruaok has a slow start to the day
2019-07-22 20309, 2019
ruaok
but I really needed that ride, even if it was super hot.
2019-07-22 20329, 2019
alastairp
where did you go?
2019-07-22 20355, 2019
ruaok
just up besos, nothing fancy. I wanted to go all weekend, but I ended up getting distracted by everything.
2019-07-22 20338, 2019
alastairp
nice
2019-07-22 20359, 2019
alastairp
yeah, we've started riding after work at 8ish, to get a bit of coolness in the day
2019-07-22 20305, 2019
alastairp
almost any other time is impossible
2019-07-22 20344, 2019
ruaok
Mr_Monkey: back yet?
2019-07-22 20311, 2019
ruaok
alastairp: yeah, 8pm would work, but there are too many other things going on then.
2019-07-22 20335, 2019
alastairp
sure, you fit stuff in whenever you can
2019-07-22 20323, 2019
Darkloke has quit
2019-07-22 20347, 2019
TOPIC: MetaBrainz Community and Development channel | MusicBrainz non-development: #musicbrainz | New GSoC students start here: https://goo.gl/7jsjG2 | Channel is logged; see https://musicbrainz.org/doc/IRC for details | Meeting agenda: Reviews, MB Summit (ruaok)
2019-07-22 20343, 2019
ruaok
pristine__: how are you doing?
2019-07-22 20317, 2019
pristine__
Hey
2019-07-22 20342, 2019
pristine__
I am good. Sorry for being afk. Was travelling.
2019-07-22 20352, 2019
pristine__
And shifting the room.
2019-07-22 20356, 2019
pristine__
How are you?
2019-07-22 20310, 2019
ruaok
good, just checking in to see if you need anything.
2019-07-22 20331, 2019
ruaok
I have a pile of metabrainz things to do today -- I might get around to doing some MSB stuff later.
ruaok: what does Default now() means? If we don't provide a timestamp then current timestamp will be added, no?
2019-07-22 20314, 2019
ruaok
correct.
2019-07-22 20356, 2019
pristine__
Then why not null clause
2019-07-22 20313, 2019
pristine__
So that no one can push null value in the col?
2019-07-22 20330, 2019
ruaok
yes
2019-07-22 20340, 2019
pristine__
Okay. Thanks
2019-07-22 20313, 2019
aidanlw17
alastairp: you could do 40 batches with the new query in the time it used to take to do only 1!
2019-07-22 20331, 2019
alastairp
great, sounds good
2019-07-22 20349, 2019
alastairp
I'm just finishing up some reviews on another project and I'll take a look at this PR again
2019-07-22 20305, 2019
alastairp
so you also fixed the query parameters?
2019-07-22 20309, 2019
aidanlw17
It took my machine ~19 minutes to do the old method for one batch.
2019-07-22 20315, 2019
aidanlw17
Yes I did fix them!
2019-07-22 20340, 2019
alastairp
perfect, sounds good
2019-07-22 20342, 2019
aidanlw17
Sort of related, Philip used arrays of NaN casted to double precision to represent rows with missing data. We decided for annoy to use vectors of the form [0, ..., 0] to represent those that didn't have a submission instead. For us that makes more sense, so I started inserting rows of 0 rather than NaN when there is missing data for a metric as well.
2019-07-22 20326, 2019
alastairp
ok, cool. it makes sense that what we have in the database is exactly what we insert into annoy
2019-07-22 20358, 2019
aidanlw17
I think so too. I found this interesting, if you add a vector to an Annoy index containing the value `None`, it converts that value to -1 when adding it to the index.
2019-07-22 20335, 2019
alastairp
ah, that's very interesting too
2019-07-22 20337, 2019
aidanlw17
We also have negative elements of our vectors though, so I still think it makes the most sense to use the value 0?
2019-07-22 20349, 2019
alastairp
that was about to be my next question -
2019-07-22 20303, 2019
alastairp
what is the scale of our features? are they all normalised from 0-1?
2019-07-22 20330, 2019
aidanlw17
Almost all values range from -1 to 1, but looking closely they are not all < 1. Some have magnitudes larger
2019-07-22 20353, 2019
aidanlw17
I took the transformation functions directly from Philip, I should look closer to see about that.
2019-07-22 20333, 2019
alastairp
we have the NormalizedLowLevelMetric classes
2019-07-22 20341, 2019
alastairp
what does that normalise to?
2019-07-22 20336, 2019
aidanlw17
Again my background on the transformation is weak, some of it I don't fully understand. For normalized lowlevel metrics, the values are: (value_from_lowlevel - mean_value)/std_dev
2019-07-22 20357, 2019
aidanlw17
Then if it is a weighteed normalized lowlevel metric, that value is multiplied afterwards by a weight factor `self.weight_vector = np.array([self.weight ** i for i in indices])`
2019-07-22 20308, 2019
aidanlw17
Where self.weight is currently set to 0.95.
2019-07-22 20343, 2019
alastairp
ok, cool
2019-07-22 20348, 2019
alastairp
let
2019-07-22 20301, 2019
alastairp
let's leave it as-is for now, perhaps we can modify it in the future
Previously, we used a function get_data to extract the lowlevel data with a specific path or the highlevel models
2019-07-22 20354, 2019
alastairp
a specific postgres query path, right?
2019-07-22 20306, 2019
alastairp
lowlevel.data->'blah'->'foo'
2019-07-22 20326, 2019
aidanlw17
Yeah exactly. I wrote a new function get_feature_data, which takes that path and extracts it from the dictionary.
2019-07-22 20334, 2019
aidanlw17
Then passes the value to transform.
2019-07-22 20343, 2019
alastairp
ahh, I see
2019-07-22 20315, 2019
aidanlw17
I left the paths as is, because I thought soon we may be able to just use the select feature paths in the postgres query
2019-07-22 20331, 2019
aidanlw17
rather than getting the whole document
2019-07-22 20345, 2019
yvanzo
alastairp: You’re welcome, musicbrainz-docker is currently sluggish until PR #106 can be updated/merged with a working SIR.
2019-07-22 20334, 2019
alastairp
mm, right. I agree that leaving the path is a good idea, I'm not sure I would have done it this way. especially `features = self.path[7:-1]` makes me a bit worried
2019-07-22 20338, 2019
yvanzo
There are two annoying bugs atm: sir reindex not always returning (which can be worked around by downloading prebuilt indexes) and sir reindex failing over some invalid characters (which is required to build indexes).
I would have written specific methods (or perhaps some lambdas?) that explicitly select the items from the dictionary
2019-07-22 20340, 2019
ruaok
the point here is to store user specific output from the collaborative filtering system.
2019-07-22 20306, 2019
aidanlw17
Yes I agree that felt a little sketchy... I'll see about rewriting that in another way.
2019-07-22 20307, 2019
ruaok
and then to allow multiples recommender scripts to access these tables and keep a record of which script has used which tracks.
2019-07-22 20316, 2019
alastairp
yvanzo: no problem. I was looking at upgrading our mirror to new schema, but perhaps I'll just wait for all of this to be finished. we only use the server/api and no search, so for us it's a matter of updating the image and running upgrade
2019-07-22 20336, 2019
alastairp
but I had some custom modifications to point to the external database server, so the fewer changes I have to make the better
2019-07-22 20305, 2019
alastairp
aidanlw17: cool. it's true that it might become a bit more complex - perhaps we'll have to write a custom transformer per method?
2019-07-22 20329, 2019
alastairp
otherwise - what about a list of dictionary keys? ['lowlevel', 'mfcc', 'mean']
2019-07-22 20343, 2019
alastairp
in fact, we could then construct the path from this anyway
2019-07-22 20302, 2019
alastairp
that way we can keep your method, but it won't involve messy string splitting
2019-07-22 20327, 2019
alastairp
ruaok: I'll have a look. while you're here, a good time to ask a question about pg schemas. it looks like you're splitting different parts of lb into separate schemas, which sounds like a great idea to me
2019-07-22 20339, 2019
alastairp
we're making some more tables for the similarity stuff. it feels like we could put this in a schema too
2019-07-22 20351, 2019
ruaok
in AB?
2019-07-22 20354, 2019
alastairp
yes
2019-07-22 20359, 2019
ruaok
yea, please do.
2019-07-22 20331, 2019
ruaok
in the end the AB similarity data ought to be copied to the LB recommendation schema.
2019-07-22 20359, 2019
ruaok
the idea is to provide complete dumps of this schema for anyone willing to try writing a recommendation engine.
2019-07-22 20340, 2019
ruaok
and it should have collabortive filtered tracks, similarity tracks, artist-artist similarity.
2019-07-22 20312, 2019
alastairp
aidanlw17: sorry, so this is one more thing on this pr :)
2019-07-22 20315, 2019
alastairp
let's put similarity tables in a schema. this is as easy as `create schema similarity` and prefix tables with the schema name when using them (`select x from similarity.similarity`)
2019-07-22 20328, 2019
alastairp
it will help us to logically separate all of the tables
2019-07-22 20324, 2019
aidanlw17
alastairp: it makes sense to me to store the keys in a list like that, and I think we’ve already done something similar in AB-404.
one nice thing you can do is `drop schema s cascade;` will drop all of the tables in the schema s, you don't have to individually drop them in drop_tables
2019-07-22 20316, 2019
aidanlw17
Okay thanks alastairp
2019-07-22 20331, 2019
aidanlw17
Cool!! Sounds handy
2019-07-22 20344, 2019
aidanlw17
alastairp: are the other tables in AB related to data are part of a different schema already?
2019-07-22 20359, 2019
alastairp
no, we have no other schemas except the default
2019-07-22 20308, 2019
alastairp
we should move some of them
2019-07-22 20328, 2019
aidanlw17
Okay. I can do that after I do these then
2019-07-22 20332, 2019
aidanlw17
If you want!
2019-07-22 20347, 2019
alastairp
that's a larger process, since we have to move existing data