#metabrainz

/

      • Nyanko-sensei joined the channel
      • D4RK-PH0ENiX has quit
      • Nyanko-sensei has quit
      • D4RK-PH0ENiX joined the channel
      • nawcom has quit
      • nawcom joined the channel
      • Darkloke joined the channel
      • Darkloke has quit
      • travis-ci joined the channel
      • travis-ci
        metabrainz/picard#4735 (master - ec776c8 : Laurent Monin): The build passed.
      • travis-ci has left the channel
      • Darkloke joined the channel
      • reosarevok
        yvanzo, bitmap: https://tickets.metabrainz.org/browse/MBS-5193 - should we wontfix as suggested by Ollie and Ian (and remove the doc section as suggested by jesus2099)?
      • BrainzBot
        MBS-5193: Regression : impossible to purposely set bad encoded alias (search hints)
      • antlarr2 has quit
      • antlarr joined the channel
      • D4RK-PH0ENiX has quit
      • aidanlw17
        Freso: I'll be out right before the meeting but should be back on when it starts - I mailed in my review incase I'm late :)
      • alastairp
        aidanlw17: good morning
      • I'm just heading off to lunch, but should be back in an hour or so
      • D4RK-PH0ENiX joined the channel
      • aidanlw17
        alastairp: hi, sounds good, we can talk when you’re back?
      • I made some new comments on the metrics PR
      • alastairp
        great. how's it going on the query optimisation?
      • aidanlw17
        I think that it’s good. We now only need one query to select the data, and one to insert it
      • One select query per batch!
      • alastairp
        awesome!
      • that's going to be so fast
      • I'll test it when I get back then
      • aidanlw17
        ~28 seconds to compute and insert one 10k recording batch on my machine
      • alastairp
        compared to how long before?
      • aidanlw17
        I’ll need to look back in my notes to report
      • I’ll tell you when you’re back from lunch! Haha.
      • alastairp
        cool, talk soon
      • D4RK-PH0ENiX has quit
      • D4RK-PH0ENiX joined the channel
      • yvanzo: thanks for all of the feedback on my tickets!
      • ruaok has a slow start to the day
      • ruaok
        but I really needed that ride, even if it was super hot.
      • alastairp
        where did you go?
      • ruaok
        just up besos, nothing fancy. I wanted to go all weekend, but I ended up getting distracted by everything.
      • alastairp
        nice
      • yeah, we've started riding after work at 8ish, to get a bit of coolness in the day
      • almost any other time is impossible
      • ruaok
        Mr_Monkey: back yet?
      • alastairp: yeah, 8pm would work, but there are too many other things going on then.
      • alastairp
        sure, you fit stuff in whenever you can
      • Darkloke has quit
      • TOPIC: MetaBrainz Community and Development channel | MusicBrainz non-development: #musicbrainz | New GSoC students start here: https://goo.gl/7jsjG2 | Channel is logged; see https://musicbrainz.org/doc/IRC for details | Meeting agenda: Reviews, MB Summit (ruaok)
      • ruaok
        pristine__: how are you doing?
      • pristine__
        Hey
      • I am good. Sorry for being afk. Was travelling.
      • And shifting the room.
      • How are you?
      • ruaok
        good, just checking in to see if you need anything.
      • I have a pile of metabrainz things to do today -- I might get around to doing some MSB stuff later.
      • pristine__
        I left some comments on #597
      • LB-server. I will give you the link. A sec
      • Changes pushed ^
      • Rotab has quit
      • ruaok: what does Default now() means? If we don't provide a timestamp then current timestamp will be added, no?
      • ruaok
        correct.
      • pristine__
        Then why not null clause
      • So that no one can push null value in the col?
      • ruaok
        yes
      • pristine__
        Okay. Thanks
      • aidanlw17
        alastairp: you could do 40 batches with the new query in the time it used to take to do only 1!
      • alastairp
        great, sounds good
      • I'm just finishing up some reviews on another project and I'll take a look at this PR again
      • so you also fixed the query parameters?
      • aidanlw17
        It took my machine ~19 minutes to do the old method for one batch.
      • Yes I did fix them!
      • alastairp
        perfect, sounds good
      • aidanlw17
        Sort of related, Philip used arrays of NaN casted to double precision to represent rows with missing data. We decided for annoy to use vectors of the form [0, ..., 0] to represent those that didn't have a submission instead. For us that makes more sense, so I started inserting rows of 0 rather than NaN when there is missing data for a metric as well.
      • alastairp
        ok, cool. it makes sense that what we have in the database is exactly what we insert into annoy
      • aidanlw17
        I think so too. I found this interesting, if you add a vector to an Annoy index containing the value `None`, it converts that value to -1 when adding it to the index.
      • alastairp
        ah, that's very interesting too
      • aidanlw17
        We also have negative elements of our vectors though, so I still think it makes the most sense to use the value 0?
      • alastairp
        that was about to be my next question -
      • what is the scale of our features? are they all normalised from 0-1?
      • aidanlw17
        Almost all values range from -1 to 1, but looking closely they are not all < 1. Some have magnitudes larger
      • I took the transformation functions directly from Philip, I should look closer to see about that.
      • alastairp
        we have the NormalizedLowLevelMetric classes
      • what does that normalise to?
      • aidanlw17
        Again my background on the transformation is weak, some of it I don't fully understand. For normalized lowlevel metrics, the values are: (value_from_lowlevel - mean_value)/std_dev
      • Then if it is a weighteed normalized lowlevel metric, that value is multiplied afterwards by a weight factor `self.weight_vector = np.array([self.weight ** i for i in indices])`
      • Where self.weight is currently set to 0.95.
      • alastairp
        ok, cool
      • let
      • let's leave it as-is for now, perhaps we can modify it in the future
      • aidanlw17
        Maybe easier if I reference it like this https://github.com/metabrainz/acousticbrainz-se... , sorry
      • Okay sounds good.
      • We'll be able to pay attention to those special cases in the evaluation as well
      • alastairp
        for the transforms, the methods now query the dictionary in python to get the values?
      • BrainzGit
        [listenbrainz-labs] mayhem merged pull request #36 (master…producer): Use a single writer script for recommendations and stats writer https://github.com/metabrainz/listenbrainz-labs...
      • aidanlw17
        Previously, we used a function get_data to extract the lowlevel data with a specific path or the highlevel models
      • alastairp
        a specific postgres query path, right?
      • lowlevel.data->'blah'->'foo'
      • aidanlw17
        Yeah exactly. I wrote a new function get_feature_data, which takes that path and extracts it from the dictionary.
      • Then passes the value to transform.
      • alastairp
        ahh, I see
      • aidanlw17
        I left the paths as is, because I thought soon we may be able to just use the select feature paths in the postgres query
      • rather than getting the whole document
      • yvanzo
        alastairp: You’re welcome, musicbrainz-docker is currently sluggish until PR #106 can be updated/merged with a working SIR.
      • alastairp
        mm, right. I agree that leaving the path is a good idea, I'm not sure I would have done it this way. especially `features = self.path[7:-1]` makes me a bit worried
      • yvanzo
        There are two annoying bugs atm: sir reindex not always returning (which can be worked around by downloading prebuilt indexes) and sir reindex failing over some invalid characters (which is required to build indexes).
      • ruaok
      • I'd love your feedback on that one.
      • alastairp
        I would have written specific methods (or perhaps some lambdas?) that explicitly select the items from the dictionary
      • ruaok
        the point here is to store user specific output from the collaborative filtering system.
      • aidanlw17
        Yes I agree that felt a little sketchy... I'll see about rewriting that in another way.
      • ruaok
        and then to allow multiples recommender scripts to access these tables and keep a record of which script has used which tracks.
      • alastairp
        yvanzo: no problem. I was looking at upgrading our mirror to new schema, but perhaps I'll just wait for all of this to be finished. we only use the server/api and no search, so for us it's a matter of updating the image and running upgrade
      • but I had some custom modifications to point to the external database server, so the fewer changes I have to make the better
      • aidanlw17: cool. it's true that it might become a bit more complex - perhaps we'll have to write a custom transformer per method?
      • otherwise - what about a list of dictionary keys? ['lowlevel', 'mfcc', 'mean']
      • in fact, we could then construct the path from this anyway
      • that way we can keep your method, but it won't involve messy string splitting
      • ruaok: I'll have a look. while you're here, a good time to ask a question about pg schemas. it looks like you're splitting different parts of lb into separate schemas, which sounds like a great idea to me
      • we're making some more tables for the similarity stuff. it feels like we could put this in a schema too
      • ruaok
        in AB?
      • alastairp
        yes
      • ruaok
        yea, please do.
      • in the end the AB similarity data ought to be copied to the LB recommendation schema.
      • the idea is to provide complete dumps of this schema for anyone willing to try writing a recommendation engine.
      • and it should have collabortive filtered tracks, similarity tracks, artist-artist similarity.
      • alastairp
        aidanlw17: sorry, so this is one more thing on this pr :)
      • let's put similarity tables in a schema. this is as easy as `create schema similarity` and prefix tables with the schema name when using them (`select x from similarity.similarity`)
      • it will help us to logically separate all of the tables
      • aidanlw17
        alastairp: it makes sense to me to store the keys in a list like that, and I think we’ve already done something similar in AB-404.
      • BrainzBot
        AB-404: Provide an API endpoint where users can select only the features that they want returned https://tickets.metabrainz.org/browse/AB-404
      • alastairp
        great
      • aidanlw17
        And ok to putting the tables in a schema :) I’ll get on that. In the same PR?
      • alastairp
        yes please
      • aidanlw17
        Ok!
      • yvanzo
        reosarevok: no idea but it seems to be related to some top priority bugs: https://tickets.metabrainz.org/issues/?jql=proj...
      • alastairp
        aidanlw17: check out create_schema and drop_schema here: https://github.com/metabrainz/listenbrainz-serv...
      • one nice thing you can do is `drop schema s cascade;` will drop all of the tables in the schema s, you don't have to individually drop them in drop_tables
      • aidanlw17
        Okay thanks alastairp
      • Cool!! Sounds handy
      • alastairp: are the other tables in AB related to data are part of a different schema already?
      • alastairp
        no, we have no other schemas except the default
      • we should move some of them
      • aidanlw17
        Okay. I can do that after I do these then
      • If you want!
      • alastairp
        that's a larger process, since we have to move existing data