#metabrainz

/

0:26 AM
Mr_Queue joined the channel

2016-08-31 24424, 2016

0:42 AM
JesseW joined the channel

2016-08-31 24406, 2016

1:35 AM
kyan has quit

2016-08-31 24439, 2016

1:39 AM
Nyanko-sensei joined the channel

2016-08-31 24404, 2016

1:43 AM
D4RK-PH0ENiX has quit

2016-08-31 24429, 2016

2:22 AM
Rotab has quit

2016-08-31 24434, 2016

2:28 AM
Gore|home has quit

2016-08-31 24449, 2016

2:28 AM
Gore|home joined the channel

2016-08-31 24453, 2016

2:59 AM
Nyanko-sensei has quit

2016-08-31 24421, 2016

3:00 AM
D4RK-PH0ENiX joined the channel

2016-08-31 24445, 2016

3:28 AM
pingupingu joined the channel

2016-08-31 24453, 2016

3:34 AM
Yurim has quit

2016-08-31 24402, 2016

3:49 AM
Yurim joined the channel

2016-08-31 24429, 2016

3:51 AM
Leftmost joined the channel

2016-08-31 24403, 2016

4:20 AM
dan6a___ has quit

2016-08-31 24448, 2016

4:20 AM
dan6a___ joined the channel

2016-08-31 24405, 2016

4:37 AM
MBJenkins has quit

2016-08-31 24439, 2016

4:37 AM
rahulr has quit

2016-08-31 24421, 2016

4:38 AM
MBJenkins joined the channel

2016-08-31 24400, 2016

4:47 AM
MBJenkins

Project musicbrainz-server_master build #566: ABORTED in 5.2 sec: https://ci.metabrainz.org/job/musicbrainz-server_…

2016-08-31 24409, 2016

5:07 AM
MBJenkins

Project musicbrainz-server_master build #567: SUCCESS in 19 min: https://ci.metabrainz.org/job/musicbrainz-server_…

2016-08-31 24429, 2016

5:31 AM
MBJenkins

Project musicbrainz-server_master build #568: UNSTABLE in 19 min: https://ci.metabrainz.org/job/musicbrainz-server_…

2016-08-31 24448, 2016

5:49 AM
JonnyJD joined the channel

2016-08-31 24459, 2016

6:00 AM
diana_olhovyk joined the channel

2016-08-31 24434, 2016

6:05 AM
JesseW has quit

2016-08-31 24410, 2016

6:06 AM
Lotheric has quit

2016-08-31 24405, 2016

6:20 AM
pingupingu has quit

2016-08-31 24459, 2016

6:51 AM
JonnyJD has quit

2016-08-31 24418, 2016

6:52 AM
dehy joined the channel

2016-08-31 24449, 2016

6:55 AM
Rotab joined the channel

2016-08-31 24412, 2016

7:03 AM
drsaunde has quit

2016-08-31 24409, 2016

7:26 AM
reosarevok sighs at an unexpected ISE http://tickets.musicbrainz.org/browse/MBS-9063 - I thought the dates would get dropped automatically here :/ I guess I can remove them with one edit then enter the other change, but I guess I'll leave it be for testing for now

2016-08-31 24445, 2016

7:31 AM
Nyanko-sensei joined the channel

2016-08-31 24404, 2016

7:34 AM
D4RK-PH0ENiX has quit

2016-08-31 24458, 2016

7:37 AM
Mineo has quit

2016-08-31 24408, 2016

8:13 AM
reosarevok

Also I wonder why suddenly almost 20 people decided to like our FB page, we normally get like one per day

2016-08-31 24453, 2016

8:20 AM
Slurpee has quit

2016-08-31 24433, 2016

8:36 AM
rahulr joined the channel

2016-08-31 24433, 2016

8:36 AM
rahulr has quit

2016-08-31 24433, 2016

8:36 AM
rahulr joined the channel

2016-08-31 24422, 2016

8:38 AM
kartikgupta0909 joined the channel

2016-08-31 24449, 2016

8:41 AM
drsaunders joined the channel

2016-08-31 24426, 2016

9:04 AM
drsaunders has quit

2016-08-31 24447, 2016

9:16 AM
Yurim has quit

2016-08-31 24443, 2016

9:20 AM
Lotheric joined the channel

2016-08-31 24447, 2016

9:27 AM
alastairp

kartikgupta0909: hi

2016-08-31 24402, 2016

9:28 AM
kartikgupta0909

Hi

2016-08-31 24427, 2016

9:28 AM
kartikgupta0909

If you have time could you merge the two branches

2016-08-31 24447, 2016

9:28 AM
alastairp

yep

2016-08-31 24452, 2016

9:28 AM
alastairp

just looking at https://github.com/metabrainz/acousticbrainz-serv… now

2016-08-31 24403, 2016

9:29 AM
alastairp

what happens if the ID is not for a valid job?

2016-08-31 24415, 2016

9:30 AM
alastairp

We have public and private datasets. I wonder what our behaviour should be if the dataset is private but we have a job for it

2016-08-31 24445, 2016

9:30 AM
alastairp

I know that we require datasets to be public before we submit them to create a job. However, I don't know if that'll change in the future

2016-08-31 24437, 2016

9:31 AM
kartikgupta0909

Yeah, but I guess a job Id will always be only known to the author of the dataset

2016-08-31 24442, 2016

9:31 AM
kartikgupta0909

so we could skip that case

2016-08-31 24456, 2016

9:31 AM
alastairp

not always

2016-08-31 24405, 2016

9:32 AM
alastairp

someone can go and browse the website and get job ids

2016-08-31 24426, 2016

9:32 AM
alastairp

https://acousticbrainz.org/datasets/list there is a list of datasets there

2016-08-31 24442, 2016

9:32 AM
kartikgupta0909

ah yes,

2016-08-31 24444, 2016

9:32 AM
alastairp

(however, all of these are public, so that's back to the first point)

2016-08-31 24405, 2016

9:33 AM
kartikgupta0909

but for private, only authors would know the job id right?

2016-08-31 24417, 2016

9:33 AM
kartikgupta0909

in case we allow public datasets to have jobs?

2016-08-31 24421, 2016

9:33 AM
kartikgupta0909

*private

2016-08-31 24436, 2016

9:34 AM
alastairp

yeah. With UUIDs it's a bit less of a problem

2016-08-31 24453, 2016

9:34 AM
alastairp

since the probability of generating the same uuid is almost 0

2016-08-31 24413, 2016

9:35 AM
kartikgupta0909

yes

2016-08-31 24430, 2016

9:35 AM
kartikgupta0909

so should I change something and add somekind of error handling or is it fine this way?

2016-08-31 24435, 2016

9:35 AM
alastairp

for example, if we had integer ids for our job ids, we would want protection to make sure that no one iterated through the all ids

2016-08-31 24450, 2016

9:35 AM
alastairp

the reason that I was thinking about private datasets is this:

2016-08-31 24452, 2016

9:35 AM
kartikgupta0909

yes that would have been a problem

2016-08-31 24434, 2016

9:36 AM
alastairp

for now it's not a problem, but if we ever change this decision, we have to remember all the parts of the code which access datasets, and add this check

2016-08-31 24453, 2016

9:36 AM
alastairp

I can imagine that we perhaps forget all the places where this could happen, and so we end up with a bug

2016-08-31 24423, 2016

9:37 AM
alastairp

however, if we do it now, it doesn't matter if we make the change in the future

2016-08-31 24449, 2016

9:37 AM
alastairp

the downside is that we have additional complexity here, which will stay forever

2016-08-31 24452, 2016

9:37 AM
alastairp

hmm.

2016-08-31 24459, 2016

9:37 AM
alastairp

Gentlecat: got any thoughts on that?

2016-08-31 24419, 2016

9:38 AM
Gentlecat

huh?

2016-08-31 24422, 2016

9:38 AM
Gentlecat reads

2016-08-31 24433, 2016

9:38 AM
ruaok

alastairp: got a sec?

2016-08-31 24435, 2016

9:38 AM
kartikgupta0909

I dont see a problem even someone tries to access a job of a private dataset

2016-08-31 24448, 2016

9:38 AM
kartikgupta0909

since this API end point doesnt demand the dataset to be private

2016-08-31 24402, 2016

9:39 AM
kartikgupta0909

its simple retrieving the job from the dataset_eval_jobs table

2016-08-31 24406, 2016

9:39 AM
Nyanko-sensei has quit

2016-08-31 24407, 2016

9:39 AM
alastairp

right. I'm not talking about the case as it is at the moment

2016-08-31 24414, 2016

9:39 AM
alastairp

ruaok: what's up?

2016-08-31 24420, 2016

9:39 AM
kartikgupta0909

ah

2016-08-31 24430, 2016

9:39 AM
alastairp

because in this case we know that all jobs are for public datasets

2016-08-31 24433, 2016

9:39 AM
D4RK-PH0ENiX joined the channel

2016-08-31 24433, 2016

9:39 AM
Gentlecat

alastairp: I think I made a change to allow submission of private datasets because of challenges

2016-08-31 24437, 2016

9:39 AM
ruaok

I'm not ready to put up a PR for the big-query stuff yet, but I wanted to catch you up.

2016-08-31 24440, 2016

9:39 AM
Gentlecat

that might be in my own branch though

2016-08-31 24444, 2016

9:39 AM
ruaok

its been an interesting set of challenges.

2016-08-31 24446, 2016

9:39 AM
alastairp

Gentlecat: ah, interesting!

2016-08-31 24401, 2016

9:40 AM
alastairp

that makes my crazy ranting valid!

2016-08-31 24413, 2016

9:41 AM
Gentlecat

so yeah, need to make sure that user owns the dataset before retrieving it

2016-08-31 24444, 2016

9:41 AM
ruaok rewinds the LB story a bit.

2016-08-31 24413, 2016

9:42 AM
alastairp

in fact, we already have https://github.com/metabrainz/acousticbrainz-serv… as a helper

2016-08-31 24415, 2016

9:42 AM
ruaok

alastairp: you know how I got rid of kafka/zookeeper, right? it was being a royal pain in the ass, never behaving.

2016-08-31 24421, 2016

9:42 AM
alastairp

which does it all for us

2016-08-31 24422, 2016

9:42 AM
alastairp

ruaok: right

2016-08-31 24427, 2016

9:42 AM
ruaok

I'm glad I did that.

2016-08-31 24434, 2016

9:42 AM
alastairp

but...

2016-08-31 24451, 2016

9:42 AM
ruaok

I discovered the pubsub command set in redis, but that isn't really good enough for our needs.

2016-08-31 24400, 2016

9:43 AM
ruaok

disconnected clients can't catch up on messages.

2016-08-31 24402, 2016

9:43 AM
Gentlecat

yeah, that's why I added it

2016-08-31 24409, 2016

9:43 AM
ruaok

https://github.com/metabrainz/listenbrainz-server…

2016-08-31 24416, 2016

9:43 AM
alastairp

kartikgupta0909: OK, so this is less complex than I thought it would be

2016-08-31 24433, 2016

9:43 AM
ruaok

so, I googled a bit and then made my own, to meet our criteria. redis is awesome.

2016-08-31 24433, 2016

9:43 AM
alastairp

use this helper method instead of db.dataset.get, and it takes care of all permission checking

2016-08-31 24447, 2016

9:43 AM
alastairp

but also make sure you check if the jobid is valid or not

2016-08-31 24400, 2016

9:44 AM
kartikgupta0909

ah okay

2016-08-31 24444, 2016

9:44 AM
ruaok

then I wrote the big-query writer, which took some doing. google's docs are too verbose and confusing. :(

2016-08-31 24459, 2016

9:44 AM
ruaok

but then I hit a wall: big query does not have unique column constraints.

2016-08-31 24400, 2016

9:45 AM
ruaok

shit.

2016-08-31 24419, 2016

9:45 AM
ruaok

just shoveling listens to BQ isn't going to work.

2016-08-31 24431, 2016

9:45 AM
alastairp

this is where you want to unique on user/date/song ?

2016-08-31 24438, 2016

9:45 AM
kartikgupta0909

I think checking the validity of the job id will have to be in the db files right?

2016-08-31 24446, 2016

9:45 AM
ruaok

no, more like repeated imports not causing duplicates.

2016-08-31 24454, 2016

9:45 AM
alastairp

right

2016-08-31 24401, 2016

9:46 AM
alastairp

(by uniqing on user/date/song) :)

2016-08-31 24416, 2016

9:46 AM
ruaok

zas has been playing with influx db, which is a time series data store.

2016-08-31 24425, 2016

9:46 AM
ruaok

and listens are effectively that -- time series data.

2016-08-31 24444, 2016

9:46 AM
ruaok

and influx does uniquing by default, so that is a good match as well.

2016-08-31 24459, 2016

9:46 AM
ruaok

so I wrote a way to store data in influx db for deduping.

2016-08-31 24415, 2016

9:47 AM
alastairp

kartikgupta0909: see that helper method

2016-08-31 24422, 2016

9:47 AM
alastairp

it's almost exactly the same as what you need

2016-08-31 24442, 2016

9:47 AM
ruaok

so now there is a pubsub for incoming listens. the influx writer listens to this and dedupes the stream and then puts the uniques onto another pubsub.

2016-08-31 24444, 2016

9:47 AM
alastairp

Gentlecat: https://github.com/metabrainz/acousticbrainz-serv… https://github.com/metabrainz/acousticbrainz-serv… got a preference for None/exception?

2016-08-31 24402, 2016

9:48 AM
ruaok

the bigquery writer then listens to that and writes those to big query.

2016-08-31 24406, 2016

9:48 AM
alastairp

ok

2016-08-31 24450, 2016

9:48 AM
kartikgupta0909

I did, but its only for the second query. For checking the validity of the job id?

2016-08-31 24453, 2016

9:48 AM
ruaok

so,with this lead-up I would like a sanity check on "schemas" for both influx and bigquery

2016-08-31 24455, 2016

9:48 AM
ruaok

https://github.com/metabrainz/listenbrainz-server…

2016-08-31 24400, 2016

9:49 AM
kartikgupta0909

I ll have to add a db function right?

2016-08-31 24404, 2016

9:49 AM
Gentlecat

let's stick with NoDataFoundException

2016-08-31 24408, 2016

9:49 AM
alastairp

Gentlecat: 👍

2016-08-31 24411, 2016

9:49 AM
kartikgupta0909

yepp

2016-08-31 24419, 2016

9:49 AM
ruaok

https://github.com/metabrainz/listenbrainz-server…

2016-08-31 24433, 2016

9:49 AM
ruaok

which are effectively the same layout.

2016-08-31 24443, 2016

9:49 AM
alastairp

kartikgupta0909: right, but the get_job method already returns None if there is no item with that ID

2016-08-31 24449, 2016

9:49 AM
ruaok

in influx tags are indexed, fields are not -- which is the most important thing to know.

2016-08-31 24406, 2016

9:50 AM
ruaok

so, when you get a chance, I'd love a second set of eyes on that.

2016-08-31 24412, 2016

9:50 AM
alastairp

and the flask decorator checks that it's a valid uuid

2016-08-31 24422, 2016

9:50 AM
Gentlecat

I would add checking of ID validity into db.dataset_eval.get_job

2016-08-31 24428, 2016

9:50 AM
alastairp

so the error you need to check for is here: https://github.com/metabrainz/acousticbrainz-serv…

2016-08-31 24443, 2016

9:50 AM
Gentlecat

and if it's not valid raise one of exceptions from db package

2016-08-31 24403, 2016

9:51 AM
kartikgupta0909

Oh okay so then I dont need to do anything for the job id part, but only for getting the dataset

2016-08-31 24405, 2016

9:51 AM
alastairp

If the id isn't a valid job id, `job` will be None and you will get an error trying to access job['dataset_id']

2016-08-31 24425, 2016

9:51 AM
kartikgupta0909

which will be handled by the get_check_dataset function

2016-08-31 24427, 2016

9:51 AM
alastairp

can you see that?