#metabrainz

/

0:16 AM
D4RK-PH0ENiX joined the channel

2016-05-06 12720, 2016

0:48 AM
muesli has quit

2016-05-06 12728, 2016

0:48 AM
Hobbyboy has quit

2016-05-06 12705, 2016

0:50 AM
leonardo joined the channel

2016-05-06 12742, 2016

0:51 AM
Hobbyboy|BNC joined the channel

2016-05-06 12720, 2016

0:52 AM
muesli joined the channel

2016-05-06 12747, 2016

1:01 AM
dseomn has quit

2016-05-06 12743, 2016

1:21 AM
Slurpee has quit

2016-05-06 12747, 2016

1:21 AM
QuoraUK has quit

2016-05-06 12739, 2016

1:26 AM
dseomn joined the channel

2016-05-06 12739, 2016

1:27 AM
Hobbyboy|BNC is now known as Hobbyboy

2016-05-06 12733, 2016

1:47 AM
kanha has quit

2016-05-06 12759, 2016

1:59 AM
kanha joined the channel

2016-05-06 12730, 2016

2:05 AM
kanha has quit

2016-05-06 12749, 2016

2:09 AM
kanha joined the channel

2016-05-06 12754, 2016

2:41 AM
LordSputnik has quit

2016-05-06 12701, 2016

3:49 AM
The_Catman has quit

2016-05-06 12733, 2016

3:49 AM
The_Catman joined the channel

2016-05-06 12754, 2016

4:00 AM
Nyanko-sensei joined the channel

2016-05-06 12703, 2016

4:03 AM
D4RK-PH0ENiX has quit

2016-05-06 12758, 2016

4:40 AM
CallerNo6 has quit

2016-05-06 12745, 2016

4:54 AM
MajorLurker joined the channel

2016-05-06 12702, 2016

5:04 AM
Nyanko-sensei has quit

2016-05-06 12729, 2016

5:04 AM
D4RK-PH0ENiX joined the channel

2016-05-06 12747, 2016

5:33 AM
MajorLurker has quit

2016-05-06 12700, 2016

5:34 AM
xps2_ has quit

2016-05-06 12722, 2016

5:34 AM
MajorLurker joined the channel

2016-05-06 12724, 2016

5:34 AM
xps2_ joined the channel

2016-05-06 12741, 2016

5:37 AM
mihaitish joined the channel

2016-05-06 12756, 2016

6:16 AM
diana_olhovyk joined the channel

2016-05-06 12754, 2016

6:39 AM
mihaitish has quit

2016-05-06 12704, 2016

6:52 AM
regagain joined the channel

2016-05-06 12758, 2016

7:06 AM
JesseW has quit

2016-05-06 12742, 2016

7:29 AM
rahulr has quit

2016-05-06 12730, 2016

7:30 AM
rahulr joined the channel

2016-05-06 12713, 2016

7:33 AM
ariscop has quit

2016-05-06 12732, 2016

7:45 AM
mike_aiir joined the channel

2016-05-06 12729, 2016

7:56 AM
laurie__ joined the channel

2016-05-06 12742, 2016

7:57 AM
MightyJay has quit

2016-05-06 12709, 2016

7:58 AM
MightyJay joined the channel

2016-05-06 12750, 2016

7:58 AM
laurie_ has quit

2016-05-06 12751, 2016

7:58 AM
laurie__ is now known as laurie_

2016-05-06 12744, 2016

9:06 AM
regagain has quit

2016-05-06 12758, 2016

9:06 AM
ariscop joined the channel

2016-05-06 12720, 2016

9:08 AM
regagain joined the channel

2016-05-06 12707, 2016

9:21 AM
kanha has quit

2016-05-06 12737, 2016

9:21 AM
cetko

alastairp: hi! I'm finally done with my midterms. Regarding my project, ruaok said you had a BigQuery guy over and it seems it might be super easy to switch to bigquery

2016-05-06 12703, 2016

9:22 AM
cetko

you said you've already migrated some of the data but it's taking up a lot of memory to run a query

2016-05-06 12710, 2016

9:22 AM
cetko

can you elaborate on that?

2016-05-06 12716, 2016

9:22 AM
cetko

where should I start?

2016-05-06 12709, 2016

9:23 AM
Major_Lurker joined the channel

2016-05-06 12710, 2016

9:26 AM
MajorLurker has quit

2016-05-06 12753, 2016

9:37 AM
Freso

cetko: Congrats on being done :)

2016-05-06 12728, 2016

9:48 AM
alastairp

cetko: hi

2016-05-06 12733, 2016

9:48 AM
alastairp

cool, let's talk

2016-05-06 12750, 2016

9:48 AM
alastairp

can you give me an email address? I'll send you what we have

2016-05-06 12701, 2016

9:49 AM
alastairp

or pm me

2016-05-06 12731, 2016

9:49 AM
LordSputnik joined the channel

2016-05-06 12754, 2016

9:49 AM
LordSputnik is now known as Guest82506

2016-05-06 12732, 2016

10:11 AM
cetko

alastairp: pm!

2016-05-06 12756, 2016

10:11 AM
cetko

thanks!

2016-05-06 12700, 2016

10:12 AM
alastairp

got it. I forwarded you the email thread that we used when we were playing with bigquery

2016-05-06 12720, 2016

10:12 AM
alastairp

it's quite easy to blindly add data to bigquery

2016-05-06 12736, 2016

10:12 AM
alastairp

I made a file which had json documents 1 per line

2016-05-06 12743, 2016

10:12 AM
alastairp

you upload them to google cloud storage

2016-05-06 12703, 2016

10:13 AM
alastairp

and from there you can import them into bigquery

2016-05-06 12711, 2016

10:13 AM
alastairp

we did this with Felipe, and it work

2016-05-06 12712, 2016

10:13 AM
alastairp

s

2016-05-06 12739, 2016

10:14 AM
alastairp

you can see some sample queries in the email I sent

2016-05-06 12702, 2016

10:15 AM
alastairp

and also here https://github.com/fhoffa/code_snippets/blob/mast…

2016-05-06 12741, 2016

10:15 AM
alastairp

there are 2 ways of performing queries on the dataset, one is to sign up for a bigquery account

2016-05-06 12743, 2016

10:15 AM
alastairp

here: https://bigquery.cloud.google.com/

2016-05-06 12704, 2016

10:16 AM
alastairp

but you can also log in to the redash demo: http://demo.redash.io/

2016-05-06 12730, 2016

10:16 AM
alastairp

(if you use the redash demo you actually count against their monthly quota, if you go against the bigquery console you go against yours)

2016-05-06 12708, 2016

10:17 AM
alastairp

for us, the easiest way to play with the data was to import it as strings representing the json data. this is quick, but it means you have to parse the *whole* document to get an item out of it

2016-05-06 12721, 2016

10:17 AM
alastairp

and you use up your quota much more quickly. (see the email)

2016-05-06 12722, 2016

10:18 AM
alastairp

this means the first thing that we should do is develop a schema which represents our lowlevel documents, this means that we can query just a specific field, and the data usage should be pretty small

2016-05-06 12738, 2016

10:18 AM
alastairp

so I see our plan as being something like this

2016-05-06 12745, 2016

10:18 AM
alastairp

1. get familiar with all the tools

2016-05-06 12758, 2016

10:18 AM
alastairp

2. work with Felipe (from google) on developing a schema

2016-05-06 12734, 2016

10:19 AM
alastairp

3. load the current data into a metabrainz/acousticbrainz database (as opposed to Felipe's private one as it is currently)

2016-05-06 12758, 2016

10:19 AM
alastairp

4. install redash so that we can do cool queries and graphs

2016-05-06 12723, 2016

10:20 AM
alastairp

5. work out how to upload new items to BQ as they come in to AB

2016-05-06 12753, 2016

10:20 AM
alastairp

6. optionally use redash to make some neat graphs that we host on the AB website

2016-05-06 12754, 2016

10:20 AM
alastairp

7. ???

2016-05-06 12732, 2016

10:21 AM
Gentlecat

8. profit

2016-05-06 12759, 2016

10:21 AM
alastairp

We also talked in your proposal about looking at the more detailed frame-level data. If we have time, it'd be great to also integrate that into bigquery. the idea is that bq would be the main interface that other people consume this data from, as it's quite large

2016-05-06 12736, 2016

10:22 AM
alastairp

the thing here is that this also requires development on the AB server, so I'm not sure how we would split up that work. It's also on my list for the summer, but perhaps we could do it together if we have tiem

2016-05-06 12709, 2016

10:27 AM
Gentlecat

alastairp: do we have a license for datasets on AB?

2016-05-06 12721, 2016

10:27 AM
alastairp

good question

2016-05-06 12733, 2016

10:27 AM
kanha joined the channel

2016-05-06 12736, 2016

10:27 AM
alastairp

what are reviews on cb?

2016-05-06 12750, 2016

10:27 AM
Gentlecat

we have 2 options on CB

2016-05-06 12713, 2016

10:28 AM
darwin

is AB data "Factual" ?

2016-05-06 12725, 2016

10:28 AM
alastairp

darwin: the data itsel fis cc0

2016-05-06 12729, 2016

10:28 AM
Gentlecat

https://github.com/metabrainz/critiquebrainz/blob…

2016-05-06 12737, 2016

10:28 AM
alastairp

but datasets are made by people

2016-05-06 12754, 2016

10:28 AM
Gentlecat

datasets are created by users and I think it would be nice to credit the author

2016-05-06 12705, 2016

10:33 AM
alastairp

yeah, I like the idea of a CC BY variation

2016-05-06 12714, 2016

10:33 AM
alastairp

forum thread?

2016-05-06 12701, 2016

10:34 AM
Gentlecat

though I would also prefer to give an option to use CC0

2016-05-06 12728, 2016

10:34 AM
Gentlecat

I remember freesound has something similar

2016-05-06 12753, 2016

10:34 AM
Gentlecat

sure, let's make a forum thread

2016-05-06 12736, 2016

10:35 AM
alastairp

I think I'd prefer to have only 1 option, "open"

2016-05-06 12701, 2016

10:36 AM
Gentlecat

what do you mean?

2016-05-06 12721, 2016

10:36 AM
alastairp

I mean I'd prefer that we don't give them a choice

2016-05-06 12732, 2016

10:36 AM
alastairp

and the only available license is something quite open

2016-05-06 12749, 2016

10:36 AM
alastairp

the idea is that we want to force people's hand a bit

2016-05-06 12758, 2016

10:36 AM
alastairp

say "if you want to participate, here are the rules"

2016-05-06 12714, 2016

10:48 AM
mihaitish joined the channel

2016-05-06 12713, 2016

11:07 AM
Gentlecat

oh, massa critica is today

2016-05-06 12718, 2016

11:08 AM
Gentlecat

are you going?

2016-05-06 12756, 2016

11:12 AM
alastairp

not sure

2016-05-06 12708, 2016

11:13 AM
alastairp

maybe 60% yes

2016-05-06 12722, 2016

11:14 AM
Gentlecat

I can't make it to the big one, could go to this one at least

2016-05-06 12753, 2016

11:14 AM
alastairp

ah, right :(

2016-05-06 12756, 2016

11:14 AM
alastairp

yeah, go for it

2016-05-06 12705, 2016

11:15 AM
alastairp

I need to take my new light out in the dark :)

2016-05-06 12743, 2016

11:15 AM
Gentlecat

well, here's a good reason :)

2016-05-06 12730, 2016

11:29 AM
Gentlecat

https://community.metabrainz.org/t/licensing-of-a…

2016-05-06 12744, 2016

11:29 AM
CJ_ joined the channel

2016-05-06 12731, 2016

12:23 PM
QuoraUK joined the channel

2016-05-06 12733, 2016

12:49 PM
diana_olhovyk has quit

2016-05-06 12704, 2016

12:50 PM
diana_olhovyk joined the channel

2016-05-06 12737, 2016

12:55 PM
CallerNo6 joined the channel

2016-05-06 12750, 2016

12:57 PM
Major_Lurker has quit

2016-05-06 12757, 2016

13:03 PM
Zialus has quit

2016-05-06 12702, 2016

13:04 PM
Zialus_PT joined the channel

2016-05-06 12735, 2016

13:25 PM
regagain has quit

2016-05-06 12703, 2016

13:28 PM
regagain joined the channel

2016-05-06 12717, 2016

13:32 PM
Slurpee joined the channel

2016-05-06 12751, 2016

13:32 PM
MBJenkins

Project acousticbrainz-server build #55: SUCCESS in 4 min 46 sec: https://ci.metabrainz.org/job/acousticbrainz-serv…

2016-05-06 12703, 2016

13:37 PM
kanha has quit

2016-05-06 12722, 2016

14:25 PM
Freso

alastairp: +1 to "forcing people's hand a bit". CC0 would still require attribution in most jurisdictions I believe, but CC-by is pretty close to being CC0 anyway, so might work too.

2016-05-06 12750, 2016

14:25 PM
alastairp

we're already doing with cc0 on the data itself

2016-05-06 12701, 2016

14:26 PM
alastairp

and the explicit aim of AB is to open this stuff up

2016-05-06 12713, 2016

14:26 PM
alastairp

I do like Gentlecat's suggestion of citation though

2016-05-06 12725, 2016

14:26 PM
alastairp

so I'm tending towards ccby

2016-05-06 12708, 2016

14:27 PM
Freso

(Or CC by-sa for the datasets, with a note that MetaBrainz can relicense as CC by for commercial entities that don't want their derived algorithms or whatever be CC'd.)

2016-05-06 12743, 2016

14:27 PM
alastairp

remember that a dataset is just a list of mbids

2016-05-06 12749, 2016

14:27 PM
Freso

Yeah, I know the data itself is CC0, only talking datasets here. :)

2016-05-06 12752, 2016

14:27 PM
alastairp

attached to a label (e.g. genre class)

2016-05-06 12702, 2016

14:28 PM
alastairp

so I don't think derived algorithms comes into it

2016-05-06 12703, 2016

14:28 PM
alastairp

right?

2016-05-06 12712, 2016

14:28 PM
alastairp

hmm

2016-05-06 12714, 2016

14:28 PM
Gentlecat

I've heard someone who I shall not name saying, "I license most of my work under ccby, but I don't mind if people don't credit me"

2016-05-06 12716, 2016

14:28 PM
Freso shrug.jpg

2016-05-06 12726, 2016

14:28 PM
Gentlecat

this seems kind of wrong to me

2016-05-06 12733, 2016

14:28 PM
alastairp

someone builds a model with a cc-by dataset

2016-05-06 12734, 2016

14:28 PM
Freso

Gentlecat: They should just do CC0 then...