#metabrainz

/

      • D4RK-PH0ENiX joined the channel
      • 2016-05-06 12720, 2016

      • muesli has quit
      • 2016-05-06 12728, 2016

      • Hobbyboy has quit
      • 2016-05-06 12705, 2016

      • leonardo joined the channel
      • 2016-05-06 12742, 2016

      • Hobbyboy|BNC joined the channel
      • 2016-05-06 12720, 2016

      • muesli joined the channel
      • 2016-05-06 12747, 2016

      • dseomn has quit
      • 2016-05-06 12743, 2016

      • Slurpee has quit
      • 2016-05-06 12747, 2016

      • QuoraUK has quit
      • 2016-05-06 12739, 2016

      • dseomn joined the channel
      • 2016-05-06 12739, 2016

      • Hobbyboy|BNC is now known as Hobbyboy
      • 2016-05-06 12733, 2016

      • kanha has quit
      • 2016-05-06 12759, 2016

      • kanha joined the channel
      • 2016-05-06 12730, 2016

      • kanha has quit
      • 2016-05-06 12749, 2016

      • kanha joined the channel
      • 2016-05-06 12754, 2016

      • LordSputnik has quit
      • 2016-05-06 12701, 2016

      • The_Catman has quit
      • 2016-05-06 12733, 2016

      • The_Catman joined the channel
      • 2016-05-06 12754, 2016

      • Nyanko-sensei joined the channel
      • 2016-05-06 12703, 2016

      • D4RK-PH0ENiX has quit
      • 2016-05-06 12758, 2016

      • CallerNo6 has quit
      • 2016-05-06 12745, 2016

      • MajorLurker joined the channel
      • 2016-05-06 12702, 2016

      • Nyanko-sensei has quit
      • 2016-05-06 12729, 2016

      • D4RK-PH0ENiX joined the channel
      • 2016-05-06 12747, 2016

      • MajorLurker has quit
      • 2016-05-06 12700, 2016

      • xps2_ has quit
      • 2016-05-06 12722, 2016

      • MajorLurker joined the channel
      • 2016-05-06 12724, 2016

      • xps2_ joined the channel
      • 2016-05-06 12741, 2016

      • mihaitish joined the channel
      • 2016-05-06 12756, 2016

      • diana_olhovyk joined the channel
      • 2016-05-06 12754, 2016

      • mihaitish has quit
      • 2016-05-06 12704, 2016

      • regagain joined the channel
      • 2016-05-06 12758, 2016

      • JesseW has quit
      • 2016-05-06 12742, 2016

      • rahulr has quit
      • 2016-05-06 12730, 2016

      • rahulr joined the channel
      • 2016-05-06 12713, 2016

      • ariscop has quit
      • 2016-05-06 12732, 2016

      • mike_aiir joined the channel
      • 2016-05-06 12729, 2016

      • laurie__ joined the channel
      • 2016-05-06 12742, 2016

      • MightyJay has quit
      • 2016-05-06 12709, 2016

      • MightyJay joined the channel
      • 2016-05-06 12750, 2016

      • laurie_ has quit
      • 2016-05-06 12751, 2016

      • laurie__ is now known as laurie_
      • 2016-05-06 12744, 2016

      • regagain has quit
      • 2016-05-06 12758, 2016

      • ariscop joined the channel
      • 2016-05-06 12720, 2016

      • regagain joined the channel
      • 2016-05-06 12707, 2016

      • kanha has quit
      • 2016-05-06 12737, 2016

      • cetko
        alastairp: hi! I'm finally done with my midterms. Regarding my project, ruaok said you had a BigQuery guy over and it seems it might be super easy to switch to bigquery
      • 2016-05-06 12703, 2016

      • cetko
        you said you've already migrated some of the data but it's taking up a lot of memory to run a query
      • 2016-05-06 12710, 2016

      • cetko
        can you elaborate on that?
      • 2016-05-06 12716, 2016

      • cetko
        where should I start?
      • 2016-05-06 12709, 2016

      • Major_Lurker joined the channel
      • 2016-05-06 12710, 2016

      • MajorLurker has quit
      • 2016-05-06 12753, 2016

      • Freso
        cetko: Congrats on being done :)
      • 2016-05-06 12728, 2016

      • alastairp
        cetko: hi
      • 2016-05-06 12733, 2016

      • alastairp
        cool, let's talk
      • 2016-05-06 12750, 2016

      • alastairp
        can you give me an email address? I'll send you what we have
      • 2016-05-06 12701, 2016

      • alastairp
        or pm me
      • 2016-05-06 12731, 2016

      • LordSputnik joined the channel
      • 2016-05-06 12754, 2016

      • LordSputnik is now known as Guest82506
      • 2016-05-06 12732, 2016

      • cetko
        alastairp: pm!
      • 2016-05-06 12756, 2016

      • cetko
        thanks!
      • 2016-05-06 12700, 2016

      • alastairp
        got it. I forwarded you the email thread that we used when we were playing with bigquery
      • 2016-05-06 12720, 2016

      • alastairp
        it's quite easy to blindly add data to bigquery
      • 2016-05-06 12736, 2016

      • alastairp
        I made a file which had json documents 1 per line
      • 2016-05-06 12743, 2016

      • alastairp
        you upload them to google cloud storage
      • 2016-05-06 12703, 2016

      • alastairp
        and from there you can import them into bigquery
      • 2016-05-06 12711, 2016

      • alastairp
        we did this with Felipe, and it work
      • 2016-05-06 12712, 2016

      • alastairp
        s
      • 2016-05-06 12739, 2016

      • alastairp
        you can see some sample queries in the email I sent
      • 2016-05-06 12702, 2016

      • alastairp
      • 2016-05-06 12741, 2016

      • alastairp
        there are 2 ways of performing queries on the dataset, one is to sign up for a bigquery account
      • 2016-05-06 12743, 2016

      • alastairp
      • 2016-05-06 12704, 2016

      • alastairp
        but you can also log in to the redash demo: http://demo.redash.io/
      • 2016-05-06 12730, 2016

      • alastairp
        (if you use the redash demo you actually count against their monthly quota, if you go against the bigquery console you go against yours)
      • 2016-05-06 12708, 2016

      • alastairp
        for us, the easiest way to play with the data was to import it as strings representing the json data. this is quick, but it means you have to parse the *whole* document to get an item out of it
      • 2016-05-06 12721, 2016

      • alastairp
        and you use up your quota much more quickly. (see the email)
      • 2016-05-06 12722, 2016

      • alastairp
        this means the first thing that we should do is develop a schema which represents our lowlevel documents, this means that we can query just a specific field, and the data usage should be pretty small
      • 2016-05-06 12738, 2016

      • alastairp
        so I see our plan as being something like this
      • 2016-05-06 12745, 2016

      • alastairp
        1. get familiar with all the tools
      • 2016-05-06 12758, 2016

      • alastairp
        2. work with Felipe (from google) on developing a schema
      • 2016-05-06 12734, 2016

      • alastairp
        3. load the current data into a metabrainz/acousticbrainz database (as opposed to Felipe's private one as it is currently)
      • 2016-05-06 12758, 2016

      • alastairp
        4. install redash so that we can do cool queries and graphs
      • 2016-05-06 12723, 2016

      • alastairp
        5. work out how to upload new items to BQ as they come in to AB
      • 2016-05-06 12753, 2016

      • alastairp
        6. optionally use redash to make some neat graphs that we host on the AB website
      • 2016-05-06 12754, 2016

      • alastairp
        7. ???
      • 2016-05-06 12732, 2016

      • Gentlecat
        8. profit
      • 2016-05-06 12759, 2016

      • alastairp
        We also talked in your proposal about looking at the more detailed frame-level data. If we have time, it'd be great to also integrate that into bigquery. the idea is that bq would be the main interface that other people consume this data from, as it's quite large
      • 2016-05-06 12736, 2016

      • alastairp
        the thing here is that this also requires development on the AB server, so I'm not sure how we would split up that work. It's also on my list for the summer, but perhaps we could do it together if we have tiem
      • 2016-05-06 12709, 2016

      • Gentlecat
        alastairp: do we have a license for datasets on AB?
      • 2016-05-06 12721, 2016

      • alastairp
        good question
      • 2016-05-06 12733, 2016

      • kanha joined the channel
      • 2016-05-06 12736, 2016

      • alastairp
        what are reviews on cb?
      • 2016-05-06 12750, 2016

      • Gentlecat
        we have 2 options on CB
      • 2016-05-06 12713, 2016

      • darwin
        is AB data "Factual" ?
      • 2016-05-06 12725, 2016

      • alastairp
        darwin: the data itsel fis cc0
      • 2016-05-06 12729, 2016

      • Gentlecat
      • 2016-05-06 12737, 2016

      • alastairp
        but datasets are made by people
      • 2016-05-06 12754, 2016

      • Gentlecat
        datasets are created by users and I think it would be nice to credit the author
      • 2016-05-06 12705, 2016

      • alastairp
        yeah, I like the idea of a CC BY variation
      • 2016-05-06 12714, 2016

      • alastairp
        forum thread?
      • 2016-05-06 12701, 2016

      • Gentlecat
        though I would also prefer to give an option to use CC0
      • 2016-05-06 12728, 2016

      • Gentlecat
        I remember freesound has something similar
      • 2016-05-06 12753, 2016

      • Gentlecat
        sure, let's make a forum thread
      • 2016-05-06 12736, 2016

      • alastairp
        I think I'd prefer to have only 1 option, "open"
      • 2016-05-06 12701, 2016

      • Gentlecat
        what do you mean?
      • 2016-05-06 12721, 2016

      • alastairp
        I mean I'd prefer that we don't give them a choice
      • 2016-05-06 12732, 2016

      • alastairp
        and the only available license is something quite open
      • 2016-05-06 12749, 2016

      • alastairp
        the idea is that we want to force people's hand a bit
      • 2016-05-06 12758, 2016

      • alastairp
        say "if you want to participate, here are the rules"
      • 2016-05-06 12714, 2016

      • mihaitish joined the channel
      • 2016-05-06 12713, 2016

      • Gentlecat
        oh, massa critica is today
      • 2016-05-06 12718, 2016

      • Gentlecat
        are you going?
      • 2016-05-06 12756, 2016

      • alastairp
        not sure
      • 2016-05-06 12708, 2016

      • alastairp
        maybe 60% yes
      • 2016-05-06 12722, 2016

      • Gentlecat
        I can't make it to the big one, could go to this one at least
      • 2016-05-06 12753, 2016

      • alastairp
        ah, right :(
      • 2016-05-06 12756, 2016

      • alastairp
        yeah, go for it
      • 2016-05-06 12705, 2016

      • alastairp
        I need to take my new light out in the dark :)
      • 2016-05-06 12743, 2016

      • Gentlecat
        well, here's a good reason :)
      • 2016-05-06 12730, 2016

      • Gentlecat
      • 2016-05-06 12744, 2016

      • CJ_ joined the channel
      • 2016-05-06 12731, 2016

      • QuoraUK joined the channel
      • 2016-05-06 12733, 2016

      • diana_olhovyk has quit
      • 2016-05-06 12704, 2016

      • diana_olhovyk joined the channel
      • 2016-05-06 12737, 2016

      • CallerNo6 joined the channel
      • 2016-05-06 12750, 2016

      • Major_Lurker has quit
      • 2016-05-06 12757, 2016

      • Zialus has quit
      • 2016-05-06 12702, 2016

      • Zialus_PT joined the channel
      • 2016-05-06 12735, 2016

      • regagain has quit
      • 2016-05-06 12703, 2016

      • regagain joined the channel
      • 2016-05-06 12717, 2016

      • Slurpee joined the channel
      • 2016-05-06 12751, 2016

      • MBJenkins
        Project acousticbrainz-server build #55: SUCCESS in 4 min 46 sec: https://ci.metabrainz.org/job/acousticbrainz-serv…
      • 2016-05-06 12703, 2016

      • kanha has quit
      • 2016-05-06 12722, 2016

      • Freso
        alastairp: +1 to "forcing people's hand a bit". CC0 would still require attribution in most jurisdictions I believe, but CC-by is pretty close to being CC0 anyway, so might work too.
      • 2016-05-06 12750, 2016

      • alastairp
        we're already doing with cc0 on the data itself
      • 2016-05-06 12701, 2016

      • alastairp
        and the explicit aim of AB is to open this stuff up
      • 2016-05-06 12713, 2016

      • alastairp
        I do like Gentlecat's suggestion of citation though
      • 2016-05-06 12725, 2016

      • alastairp
        so I'm tending towards ccby
      • 2016-05-06 12708, 2016

      • Freso
        (Or CC by-sa for the datasets, with a note that MetaBrainz can relicense as CC by for commercial entities that don't want their derived algorithms or whatever be CC'd.)
      • 2016-05-06 12743, 2016

      • alastairp
        remember that a dataset is just a list of mbids
      • 2016-05-06 12749, 2016

      • Freso
        Yeah, I know the data itself is CC0, only talking datasets here. :)
      • 2016-05-06 12752, 2016

      • alastairp
        attached to a label (e.g. genre class)
      • 2016-05-06 12702, 2016

      • alastairp
        so I don't think derived algorithms comes into it
      • 2016-05-06 12703, 2016

      • alastairp
        right?
      • 2016-05-06 12712, 2016

      • alastairp
        hmm
      • 2016-05-06 12714, 2016

      • Gentlecat
        I've heard someone who I shall not name saying, "I license most of my work under ccby, but I don't mind if people don't credit me"
      • 2016-05-06 12716, 2016

      • Freso shrug.jpg
      • 2016-05-06 12726, 2016

      • Gentlecat
        this seems kind of wrong to me
      • 2016-05-06 12733, 2016

      • alastairp
        someone builds a model with a cc-by dataset
      • 2016-05-06 12734, 2016

      • Freso
        Gentlecat: They should just do CC0 then...