#metabrainz

/

      • Mr_Queue joined the channel
      • 2016-08-31 24424, 2016

      • JesseW joined the channel
      • 2016-08-31 24406, 2016

      • kyan has quit
      • 2016-08-31 24439, 2016

      • Nyanko-sensei joined the channel
      • 2016-08-31 24404, 2016

      • D4RK-PH0ENiX has quit
      • 2016-08-31 24429, 2016

      • Rotab has quit
      • 2016-08-31 24434, 2016

      • Gore|home has quit
      • 2016-08-31 24449, 2016

      • Gore|home joined the channel
      • 2016-08-31 24453, 2016

      • Nyanko-sensei has quit
      • 2016-08-31 24421, 2016

      • D4RK-PH0ENiX joined the channel
      • 2016-08-31 24445, 2016

      • pingupingu joined the channel
      • 2016-08-31 24453, 2016

      • Yurim has quit
      • 2016-08-31 24402, 2016

      • Yurim joined the channel
      • 2016-08-31 24429, 2016

      • Leftmost joined the channel
      • 2016-08-31 24403, 2016

      • dan6a___ has quit
      • 2016-08-31 24448, 2016

      • dan6a___ joined the channel
      • 2016-08-31 24405, 2016

      • MBJenkins has quit
      • 2016-08-31 24439, 2016

      • rahulr has quit
      • 2016-08-31 24421, 2016

      • MBJenkins joined the channel
      • 2016-08-31 24400, 2016

      • MBJenkins
        Project musicbrainz-server_master build #566: ABORTED in 5.2 sec: https://ci.metabrainz.org/job/musicbrainz-server_…
      • 2016-08-31 24409, 2016

      • MBJenkins
        Project musicbrainz-server_master build #567: SUCCESS in 19 min: https://ci.metabrainz.org/job/musicbrainz-server_…
      • 2016-08-31 24429, 2016

      • MBJenkins
        Project musicbrainz-server_master build #568: UNSTABLE in 19 min: https://ci.metabrainz.org/job/musicbrainz-server_…
      • 2016-08-31 24448, 2016

      • JonnyJD joined the channel
      • 2016-08-31 24459, 2016

      • diana_olhovyk joined the channel
      • 2016-08-31 24434, 2016

      • JesseW has quit
      • 2016-08-31 24410, 2016

      • Lotheric has quit
      • 2016-08-31 24405, 2016

      • pingupingu has quit
      • 2016-08-31 24459, 2016

      • JonnyJD has quit
      • 2016-08-31 24418, 2016

      • dehy joined the channel
      • 2016-08-31 24449, 2016

      • Rotab joined the channel
      • 2016-08-31 24412, 2016

      • drsaunde has quit
      • 2016-08-31 24409, 2016

      • reosarevok sighs at an unexpected ISE http://tickets.musicbrainz.org/browse/MBS-9063 - I thought the dates would get dropped automatically here :/ I guess I can remove them with one edit then enter the other change, but I guess I'll leave it be for testing for now
      • 2016-08-31 24445, 2016

      • Nyanko-sensei joined the channel
      • 2016-08-31 24404, 2016

      • D4RK-PH0ENiX has quit
      • 2016-08-31 24458, 2016

      • Mineo has quit
      • 2016-08-31 24408, 2016

      • reosarevok
        Also I wonder why suddenly almost 20 people decided to like our FB page, we normally get like one per day
      • 2016-08-31 24453, 2016

      • Slurpee has quit
      • 2016-08-31 24433, 2016

      • rahulr joined the channel
      • 2016-08-31 24433, 2016

      • rahulr has quit
      • 2016-08-31 24433, 2016

      • rahulr joined the channel
      • 2016-08-31 24422, 2016

      • kartikgupta0909 joined the channel
      • 2016-08-31 24449, 2016

      • drsaunders joined the channel
      • 2016-08-31 24426, 2016

      • drsaunders has quit
      • 2016-08-31 24447, 2016

      • Yurim has quit
      • 2016-08-31 24443, 2016

      • Lotheric joined the channel
      • 2016-08-31 24447, 2016

      • alastairp
        kartikgupta0909: hi
      • 2016-08-31 24402, 2016

      • kartikgupta0909
        Hi
      • 2016-08-31 24427, 2016

      • kartikgupta0909
        If you have time could you merge the two branches
      • 2016-08-31 24447, 2016

      • alastairp
        yep
      • 2016-08-31 24452, 2016

      • alastairp
      • 2016-08-31 24403, 2016

      • alastairp
        what happens if the ID is not for a valid job?
      • 2016-08-31 24415, 2016

      • alastairp
        We have public and private datasets. I wonder what our behaviour should be if the dataset is private but we have a job for it
      • 2016-08-31 24445, 2016

      • alastairp
        I know that we require datasets to be public before we submit them to create a job. However, I don't know if that'll change in the future
      • 2016-08-31 24437, 2016

      • kartikgupta0909
        Yeah, but I guess a job Id will always be only known to the author of the dataset
      • 2016-08-31 24442, 2016

      • kartikgupta0909
        so we could skip that case
      • 2016-08-31 24456, 2016

      • alastairp
        not always
      • 2016-08-31 24405, 2016

      • alastairp
        someone can go and browse the website and get job ids
      • 2016-08-31 24426, 2016

      • alastairp
        https://acousticbrainz.org/datasets/list there is a list of datasets there
      • 2016-08-31 24442, 2016

      • kartikgupta0909
        ah yes,
      • 2016-08-31 24444, 2016

      • alastairp
        (however, all of these are public, so that's back to the first point)
      • 2016-08-31 24405, 2016

      • kartikgupta0909
        but for private, only authors would know the job id right?
      • 2016-08-31 24417, 2016

      • kartikgupta0909
        in case we allow public datasets to have jobs?
      • 2016-08-31 24421, 2016

      • kartikgupta0909
        *private
      • 2016-08-31 24436, 2016

      • alastairp
        yeah. With UUIDs it's a bit less of a problem
      • 2016-08-31 24453, 2016

      • alastairp
        since the probability of generating the same uuid is almost 0
      • 2016-08-31 24413, 2016

      • kartikgupta0909
        yes
      • 2016-08-31 24430, 2016

      • kartikgupta0909
        so should I change something and add somekind of error handling or is it fine this way?
      • 2016-08-31 24435, 2016

      • alastairp
        for example, if we had integer ids for our job ids, we would want protection to make sure that no one iterated through the all ids
      • 2016-08-31 24450, 2016

      • alastairp
        the reason that I was thinking about private datasets is this:
      • 2016-08-31 24452, 2016

      • kartikgupta0909
        yes that would have been a problem
      • 2016-08-31 24434, 2016

      • alastairp
        for now it's not a problem, but if we ever change this decision, we have to remember all the parts of the code which access datasets, and add this check
      • 2016-08-31 24453, 2016

      • alastairp
        I can imagine that we perhaps forget all the places where this could happen, and so we end up with a bug
      • 2016-08-31 24423, 2016

      • alastairp
        however, if we do it now, it doesn't matter if we make the change in the future
      • 2016-08-31 24449, 2016

      • alastairp
        the downside is that we have additional complexity here, which will stay forever
      • 2016-08-31 24452, 2016

      • alastairp
        hmm.
      • 2016-08-31 24459, 2016

      • alastairp
        Gentlecat: got any thoughts on that?
      • 2016-08-31 24419, 2016

      • Gentlecat
        huh?
      • 2016-08-31 24422, 2016

      • Gentlecat reads
      • 2016-08-31 24433, 2016

      • ruaok
        alastairp: got a sec?
      • 2016-08-31 24435, 2016

      • kartikgupta0909
        I dont see a problem even someone tries to access a job of a private dataset
      • 2016-08-31 24448, 2016

      • kartikgupta0909
        since this API end point doesnt demand the dataset to be private
      • 2016-08-31 24402, 2016

      • kartikgupta0909
        its simple retrieving the job from the dataset_eval_jobs table
      • 2016-08-31 24406, 2016

      • Nyanko-sensei has quit
      • 2016-08-31 24407, 2016

      • alastairp
        right. I'm not talking about the case as it is at the moment
      • 2016-08-31 24414, 2016

      • alastairp
        ruaok: what's up?
      • 2016-08-31 24420, 2016

      • kartikgupta0909
        ah
      • 2016-08-31 24430, 2016

      • alastairp
        because in this case we know that all jobs are for public datasets
      • 2016-08-31 24433, 2016

      • D4RK-PH0ENiX joined the channel
      • 2016-08-31 24433, 2016

      • Gentlecat
        alastairp: I think I made a change to allow submission of private datasets because of challenges
      • 2016-08-31 24437, 2016

      • ruaok
        I'm not ready to put up a PR for the big-query stuff yet, but I wanted to catch you up.
      • 2016-08-31 24440, 2016

      • Gentlecat
        that might be in my own branch though
      • 2016-08-31 24444, 2016

      • ruaok
        its been an interesting set of challenges.
      • 2016-08-31 24446, 2016

      • alastairp
        Gentlecat: ah, interesting!
      • 2016-08-31 24401, 2016

      • alastairp
        that makes my crazy ranting valid!
      • 2016-08-31 24413, 2016

      • Gentlecat
        so yeah, need to make sure that user owns the dataset before retrieving it
      • 2016-08-31 24444, 2016

      • ruaok rewinds the LB story a bit.
      • 2016-08-31 24413, 2016

      • alastairp
        in fact, we already have https://github.com/metabrainz/acousticbrainz-serv… as a helper
      • 2016-08-31 24415, 2016

      • ruaok
        alastairp: you know how I got rid of kafka/zookeeper, right? it was being a royal pain in the ass, never behaving.
      • 2016-08-31 24421, 2016

      • alastairp
        which does it all for us
      • 2016-08-31 24422, 2016

      • alastairp
        ruaok: right
      • 2016-08-31 24427, 2016

      • ruaok
        I'm glad I did that.
      • 2016-08-31 24434, 2016

      • alastairp
        but...
      • 2016-08-31 24451, 2016

      • ruaok
        I discovered the pubsub command set in redis, but that isn't really good enough for our needs.
      • 2016-08-31 24400, 2016

      • ruaok
        disconnected clients can't catch up on messages.
      • 2016-08-31 24402, 2016

      • Gentlecat
        yeah, that's why I added it
      • 2016-08-31 24409, 2016

      • ruaok
      • 2016-08-31 24416, 2016

      • alastairp
        kartikgupta0909: OK, so this is less complex than I thought it would be
      • 2016-08-31 24433, 2016

      • ruaok
        so, I googled a bit and then made my own, to meet our criteria. redis is awesome.
      • 2016-08-31 24433, 2016

      • alastairp
        use this helper method instead of db.dataset.get, and it takes care of all permission checking
      • 2016-08-31 24447, 2016

      • alastairp
        but also make sure you check if the jobid is valid or not
      • 2016-08-31 24400, 2016

      • kartikgupta0909
        ah okay
      • 2016-08-31 24444, 2016

      • ruaok
        then I wrote the big-query writer, which took some doing. google's docs are too verbose and confusing. :(
      • 2016-08-31 24459, 2016

      • ruaok
        but then I hit a wall: big query does not have unique column constraints.
      • 2016-08-31 24400, 2016

      • ruaok
        shit.
      • 2016-08-31 24419, 2016

      • ruaok
        just shoveling listens to BQ isn't going to work.
      • 2016-08-31 24431, 2016

      • alastairp
        this is where you want to unique on user/date/song ?
      • 2016-08-31 24438, 2016

      • kartikgupta0909
        I think checking the validity of the job id will have to be in the db files right?
      • 2016-08-31 24446, 2016

      • ruaok
        no, more like repeated imports not causing duplicates.
      • 2016-08-31 24454, 2016

      • alastairp
        right
      • 2016-08-31 24401, 2016

      • alastairp
        (by uniqing on user/date/song) :)
      • 2016-08-31 24416, 2016

      • ruaok
        zas has been playing with influx db, which is a time series data store.
      • 2016-08-31 24425, 2016

      • ruaok
        and listens are effectively that -- time series data.
      • 2016-08-31 24444, 2016

      • ruaok
        and influx does uniquing by default, so that is a good match as well.
      • 2016-08-31 24459, 2016

      • ruaok
        so I wrote a way to store data in influx db for deduping.
      • 2016-08-31 24415, 2016

      • alastairp
        kartikgupta0909: see that helper method
      • 2016-08-31 24422, 2016

      • alastairp
        it's almost exactly the same as what you need
      • 2016-08-31 24442, 2016

      • ruaok
        so now there is a pubsub for incoming listens. the influx writer listens to this and dedupes the stream and then puts the uniques onto another pubsub.
      • 2016-08-31 24444, 2016

      • alastairp
      • 2016-08-31 24402, 2016

      • ruaok
        the bigquery writer then listens to that and writes those to big query.
      • 2016-08-31 24406, 2016

      • alastairp
        ok
      • 2016-08-31 24450, 2016

      • kartikgupta0909
        I did, but its only for the second query. For checking the validity of the job id?
      • 2016-08-31 24453, 2016

      • ruaok
        so,with this lead-up I would like a sanity check on "schemas" for both influx and bigquery
      • 2016-08-31 24455, 2016

      • ruaok
      • 2016-08-31 24400, 2016

      • kartikgupta0909
        I ll have to add a db function right?
      • 2016-08-31 24404, 2016

      • Gentlecat
        let's stick with NoDataFoundException
      • 2016-08-31 24408, 2016

      • alastairp
        Gentlecat: 👍
      • 2016-08-31 24411, 2016

      • kartikgupta0909
        yepp
      • 2016-08-31 24419, 2016

      • ruaok
      • 2016-08-31 24433, 2016

      • ruaok
        which are effectively the same layout.
      • 2016-08-31 24443, 2016

      • alastairp
        kartikgupta0909: right, but the get_job method already returns None if there is no item with that ID
      • 2016-08-31 24449, 2016

      • ruaok
        in influx tags are indexed, fields are not -- which is the most important thing to know.
      • 2016-08-31 24406, 2016

      • ruaok
        so, when you get a chance, I'd love a second set of eyes on that.
      • 2016-08-31 24412, 2016

      • alastairp
        and the flask decorator checks that it's a valid uuid
      • 2016-08-31 24422, 2016

      • Gentlecat
        I would add checking of ID validity into db.dataset_eval.get_job
      • 2016-08-31 24428, 2016

      • alastairp
        so the error you need to check for is here: https://github.com/metabrainz/acousticbrainz-serv…
      • 2016-08-31 24443, 2016

      • Gentlecat
        and if it's not valid raise one of exceptions from db package
      • 2016-08-31 24403, 2016

      • kartikgupta0909
        Oh okay so then I dont need to do anything for the job id part, but only for getting the dataset
      • 2016-08-31 24405, 2016

      • alastairp
        If the id isn't a valid job id, `job` will be None and you will get an error trying to access job['dataset_id']
      • 2016-08-31 24425, 2016

      • kartikgupta0909
        which will be handled by the get_check_dataset function
      • 2016-08-31 24427, 2016

      • alastairp
        can you see that?