#metabrainz

/

      • gcilou
        Freso: An hour later I suppose
      • 2016-10-31 30527, 2016

      • Freso
        alastairp: 👍 That's what I left it there for. :)
      • 2016-10-31 30529, 2016

      • alastairp
        ruaok: I saw the tweet
      • 2016-10-31 30535, 2016

      • alastairp
        that's... a lot of chocolate
      • 2016-10-31 30551, 2016

      • ruaok
        yeah, it was really really good this year.
      • 2016-10-31 30557, 2016

      • ruaok
        great participation, great location.
      • 2016-10-31 30501, 2016

      • gcilou
        I think that's more chocolate than I've ever seen in one place..
      • 2016-10-31 30509, 2016

      • Clint
        who started attaching the wrappers to the wall?
      • 2016-10-31 30527, 2016

      • ruaok
      • 2016-10-31 30530, 2016

      • ruaok
        ha. :)
      • 2016-10-31 30500, 2016

      • ruaok
        someone suggested it while I had tape in my hand. so I put one up and left the tape dispenser. the rest was a communal effort.
      • 2016-10-31 30548, 2016

      • ruaok
        (there is still a terrifying amount of chocolate in my suitcase. it might be more than I came with.)
      • 2016-10-31 30501, 2016

      • Freso
        ruaok: Is that at the SF office?
      • 2016-10-31 30525, 2016

      • ruaok
        Tech Corner 4 in Sunnyvale.
      • 2016-10-31 30533, 2016

      • dboys has quit
      • 2016-10-31 30536, 2016

      • Freso
        Alright.
      • 2016-10-31 30536, 2016

      • gcilou
        Freso: I thought it was the SF office too :) I guess it has the same feel..
      • 2016-10-31 30550, 2016

      • Freso
        :)
      • 2016-10-31 30512, 2016

      • alastairp
        ruaok: I don't think you told me when you were meeting with Felipe
      • 2016-10-31 30514, 2016

      • dseomn has quit
      • 2016-10-31 30516, 2016

      • alastairp
        has that already happened?
      • 2016-10-31 30535, 2016

      • ruaok
        nope. noon today
      • 2016-10-31 30559, 2016

      • alastairp
        ah. cool
      • 2016-10-31 30554, 2016

      • alastairp
        I'm interested in hearing how it goes
      • 2016-10-31 30504, 2016

      • alastairp
        I'm just writing some stuff now to bulk load all of AB
      • 2016-10-31 30521, 2016

      • alastairp
        we're going to have to do some significant changes to the json format to accommodate it though :(
      • 2016-10-31 30506, 2016

      • ruaok
        why so?
      • 2016-10-31 30541, 2016

      • ruaok
        effectively a schema-translation to make things fit?
      • 2016-10-31 30503, 2016

      • dseomn joined the channel
      • 2016-10-31 30512, 2016

      • alastairp
        2 main things
      • 2016-10-31 30546, 2016

      • alastairp
        - can't have list-of-lists in BQ, we'll have to do [{key: x, values: [1,2,3]}, ...]
      • 2016-10-31 30503, 2016

      • alastairp
        which doesn't make sense for some of the matrix data that we have in lowlevel, but not really much we can do about it
      • 2016-10-31 30533, 2016

      • ruaok
        got it. thankfully that makes little sense to query. batch process, yeah, query less so
      • 2016-10-31 30545, 2016

      • ruaok
        and yeah, the same problems exists in LB
      • 2016-10-31 30551, 2016

      • alastairp
        - in hl, the probabilities dict for each model uses the class name as the dict key: {values : {x: 0.5, y:0.2, z:0.3}}
      • 2016-10-31 30509, 2016

      • alastairp
        but that means the schema would have to know what all the classes for all the models are. not good, especially if we add more stuff
      • 2016-10-31 30534, 2016

      • alastairp
        I changed it to {values: [{class: x, prob: 0.5}, {class: y, ...}, ...]}
      • 2016-10-31 30513, 2016

      • ruaok
        puffs up the data, but hey, we're not paying for that. ;-)
      • 2016-10-31 30533, 2016

      • alastairp
        but if it's a schema, who knows how they're optimising it in the backend
      • 2016-10-31 30541, 2016

      • JesseW joined the channel
      • 2016-10-31 30541, 2016

      • alastairp
        perhaps it won't affect storage at all
      • 2016-10-31 30545, 2016

      • ruaok
        I have a feeling that BQ is what it is because the operations it provides are scalable and super fast.
      • 2016-10-31 30508, 2016

      • ruaok
        so, like all googlers, we need to adapt our data models...
      • 2016-10-31 30515, 2016

      • alastairp
        right
      • 2016-10-31 30512, 2016

      • alastairp downloads all of jamendo
      • 2016-10-31 30504, 2016

      • ruaok needs to get ready. pono, google today. wikimedia, github and mozilla sf tomorrow.
      • 2016-10-31 30536, 2016

      • alastairp
        cool! can't wait to hear about it al
      • 2016-10-31 30540, 2016

      • alastairp
        have a fun week
      • 2016-10-31 30546, 2016

      • alastairp
        or at least, next few days
      • 2016-10-31 30530, 2016

      • ruaok
        thanks. I'll report back... hmmm. via freso.
      • 2016-10-31 30545, 2016

      • ruaok
        I wont make the meeting today. maybe the tail end.
      • 2016-10-31 30504, 2016

      • ruaok
        and next week's meeting is going to be dicey, so say the least.
      • 2016-10-31 30525, 2016

      • mihaitish has quit
      • 2016-10-31 30526, 2016

      • jesus2099 has left the channel
      • 2016-10-31 30507, 2016

      • Freso
        ruaok: We can cancel next week's meeting perhaps? I won't have any GCI news until after the 9th, and from the 9th on out, I'll be deep-ish in GCI stuff.
      • 2016-10-31 30538, 2016

      • Freso
        (Which means I'll be around a lot. :))
      • 2016-10-31 30527, 2016

      • TOPIC: MetaBrainz Community and Development channel | MusicBrainz non-development: #musicbrainz | Countdown to old servers being taken offline: https://goo.gl/eJxLRC | MeB meeting agenda: reviews, cancel W45 meeting? (Freso)
      • 2016-10-31 30519, 2016

      • ruaok
        seems like a plan to me.
      • 2016-10-31 30536, 2016

      • ruaok
        I think one way or another I am going to spend most of the day at DWNI.
      • 2016-10-31 30510, 2016

      • Freso
        Right. Let's see if anyone has objections later, and if not, that's what we'll do.
      • 2016-10-31 30550, 2016

      • jesus2099 joined the channel
      • 2016-10-31 30512, 2016

      • jesus2099
        bitmap Freso zas reosarevok (or other discourse admins, I’m not sure your are): Could we have those following similar generic “please vote for my edits” topics together in one ? I’m listing them, oldest first:
      • 2016-10-31 30513, 2016

      • jesus2099
      • 2016-10-31 30515, 2016

      • jesus2099
      • 2016-10-31 30517, 2016

      • jesus2099
      • 2016-10-31 30519, 2016

      • jesus2099
      • 2016-10-31 30540, 2016

      • jesus2099
        It would be nice… And maybe we would give it the best topic name out of these four…
      • 2016-10-31 30521, 2016

      • JesseW has quit
      • 2016-10-31 30548, 2016

      • chirlu has quit
      • 2016-10-31 30507, 2016

      • chirlu joined the channel
      • 2016-10-31 30553, 2016

      • Freso
        jesus2099: I don't know. I actually think it's fine to have them on their own.
      • 2016-10-31 30510, 2016

      • jesus2099
        ok, maybe. :) I like tidy things…
      • 2016-10-31 30514, 2016

      • jesus2099
        I don’t like very much that I get too many results when I look for topics…
      • 2016-10-31 30533, 2016

      • jesus2099
        But you’re probably right, otherwise. :)
      • 2016-10-31 30559, 2016

      • Freso packs up and heads back to flat
      • 2016-10-31 30510, 2016

      • jesus2099 has left the channel
      • 2016-10-31 30512, 2016

      • CatCat
        am i on time for the meeting today?
      • 2016-10-31 30522, 2016

      • CatQuest
        Freso: ping me when it's due
      • 2016-10-31 30543, 2016

      • CatQuest
        (on catcat)
      • 2016-10-31 30514, 2016

      • ruaok
        yay. we've been gifted a pono player. gives us in BCN a chance to kick the tires.
      • 2016-10-31 30526, 2016

      • ruaok
        we should see them soon as supporters too. :)
      • 2016-10-31 30532, 2016

      • alastairp
        sweet
      • 2016-10-31 30550, 2016

      • alastairp
        A/B blind test, anyone? :)
      • 2016-10-31 30556, 2016

      • ruaok
        let's do it.
      • 2016-10-31 30510, 2016

      • ruaok is loafing wifi off GoogleGuest
      • 2016-10-31 30517, 2016

      • ruaok
        best guest wifi in the world.
      • 2016-10-31 30521, 2016

      • alastairp
        is the internet really fast?
      • 2016-10-31 30524, 2016

      • tom[] has quit
      • 2016-10-31 30522, 2016

      • leonardo has quit
      • 2016-10-31 30545, 2016

      • leonardo joined the channel
      • 2016-10-31 30509, 2016

      • leonardo is now known as Guest63314
      • 2016-10-31 30552, 2016

      • tom[] joined the channel
      • 2016-10-31 30516, 2016

      • CatQuest
        for a momment I read "GoogleQuest" and was amused/confused
      • 2016-10-31 30556, 2016

      • ruaok
        best internet outside our office. :)
      • 2016-10-31 30529, 2016

      • ruaok
        alastairp: around? felipe and I are here in the cafe.
      • 2016-10-31 30513, 2016

      • Guest63314 is now known as leonardo
      • 2016-10-31 30534, 2016

      • kyan has quit
      • 2016-10-31 30501, 2016

      • alastairp
        ruaok: hi
      • 2016-10-31 30511, 2016

      • ruaok
        ok, so.
      • 2016-10-31 30523, 2016

      • ruaok
        general idea: we should massage the data as much as possible before going to BQ.
      • 2016-10-31 30546, 2016

      • alastairp
        OK, that's in line with what we were talking about before. good
      • 2016-10-31 30523, 2016

      • ruaok
        meaning that if we need a rowid, we should have PG assign that and then use that as an internal id to keep the data in sync.
      • 2016-10-31 30537, 2016

      • ruaok
        wich is what I did with LB, so we're on the right track.
      • 2016-10-31 30516, 2016

      • ruaok
        do you have an example of lists of lists problem.
      • 2016-10-31 30538, 2016

      • alastairp
        OK. also what I asked in my email. So as long as we're OK with making that postgres rowid a "public identifier", it seems like that's the easiest way for us to do it in AB too
      • 2016-10-31 30540, 2016

      • alastairp
        sure, one sec
      • 2016-10-31 30500, 2016

      • alastairp
      • 2016-10-31 30510, 2016

      • alastairp
        search for "mfcc"
      • 2016-10-31 30525, 2016

      • alastairp
        the "cov" entry is a matrix, 13x13
      • 2016-10-31 30520, 2016

      • alastairp
        one thing I'm not sure about in our usage is if we're going to need to filter with this data
      • 2016-10-31 30547, 2016

      • alastairp
        which I think will indicate how we store it too
      • 2016-10-31 30529, 2016

      • ruaok
        so a patter that Felipe is suggesting: https://github.com/fhoffa/code_snippets/blob/mast…
      • 2016-10-31 30542, 2016

      • ruaok
        load data into BQ by hook or crook. just get it in.
      • 2016-10-31 30554, 2016

      • ruaok
        the upload JS snippets to do the transformation all inside BQ.
      • 2016-10-31 30522, 2016

      • alastairp
        right. to confirm: this means that if we want to do something with this data, we will read the entire field, and that will count towards our limit?
      • 2016-10-31 30524, 2016

      • ruaok
        here is the final result of the query:
      • 2016-10-31 30525, 2016

      • ruaok
      • 2016-10-31 30541, 2016

      • ruaok
        yes, but you do it only once.
      • 2016-10-31 30549, 2016

      • ruaok
        and we have enough credits to do this, so no problem.
      • 2016-10-31 30554, 2016

      • alastairp
        per query?
      • 2016-10-31 30502, 2016

      • alastairp
        I'm thinking about other people who want to do it
      • 2016-10-31 30502, 2016

      • ruaok
        no, for loading.
      • 2016-10-31 30507, 2016

      • alastairp
        ah, right.
      • 2016-10-31 30530, 2016

      • ruaok
        yeah, load it rough, then write a transformational query with JS snippets to make more columns.
      • 2016-10-31 30553, 2016

      • alastairp
        similar to this, we will have more data like this list-of-list if we want to load the frame-level data
      • 2016-10-31 30515, 2016

      • alastairp
        e.g. mfccs will be 12 values *per frame*
      • 2016-10-31 30524, 2016

      • alastairp
        so there will be hundreds of thousands of frames
      • 2016-10-31 30505, 2016

      • ruaok
        yes, so we need to carefully examine what we want to query.
      • 2016-10-31 30525, 2016

      • ruaok
        expand all things that need querying into columns, store the non query stuff in full JSON fields that are BQ string types.
      • 2016-10-31 30548, 2016

      • alastairp
        OK, great. that sounds really close to what we were already thinking
      • 2016-10-31 30504, 2016

      • ruaok
        yeah.
      • 2016-10-31 30504, 2016

      • alastairp
        expanding during ingest sounds neat
      • 2016-10-31 30510, 2016

      • alastairp
        what about duplication of data?
      • 2016-10-31 30518, 2016

      • alastairp
        e.g., our "metadata" blocks
      • 2016-10-31 30521, 2016

      • ruaok
        that is best handled before the import.
      • 2016-10-31 30539, 2016

      • alastairp
        we have it in a lowlevel row
      • 2016-10-31 30539, 2016

      • ruaok
        so, as in LB I am de-duping the data in influx DB, then ship to BQ.
      • 2016-10-31 30547, 2016

      • alastairp
        and also in a highlevel row
      • 2016-10-31 30554, 2016

      • ruaok
        oh, it is?
      • 2016-10-31 30558, 2016

      • alastairp
        but I am also splitting up the highlevel data per model
      • 2016-10-31 30503, 2016

      • alastairp
        well, we dedup it in postgres
      • 2016-10-31 30535, 2016

      • alastairp
        so if we have 12 models, we can generate 12 json documents with the results of model x, and also the metadata block in each json document
      • 2016-10-31 30542, 2016

      • alastairp
        that seems like way too much duplication
      • 2016-10-31 30500, 2016

      • ruaok
        metadata exists in highlevel and lowlevel, right?
      • 2016-10-31 30515, 2016

      • alastairp
        so should we just store the metadata in the ll table, and join against that when we need it, or have a separate table in BQ?
      • 2016-10-31 30540, 2016

      • alastairp
      • 2016-10-31 30547, 2016

      • alastairp
        they both have a section "metadata"
      • 2016-10-31 30527, 2016

      • ruaok
        storage is not a problem, says felipe.
      • 2016-10-31 30530, 2016

      • alastairp
        it's worth noting that "tags" is exactly the same
      • 2016-10-31 30536, 2016

      • ruaok
        so duplicate in order to make queriying easier.