#metabrainz

/

      • heyoni has quit
      • 2017-12-22 35653, 2017

      • ruaok
        if it is, then we're safe. but still may not be ideal.
      • 2017-12-22 35659, 2017

      • heyoni joined the channel
      • 2017-12-22 35615, 2017

      • iliekcomputers
      • 2017-12-22 35626, 2017

      • ruaok
        (everytime I've made decisions like this one, I've had people laughing at me 10 years later)
      • 2017-12-22 35611, 2017

      • ruaok
      • 2017-12-22 35632, 2017

      • ruaok
        if this person starts recording listens, no one on windows can open up our dataset.
      • 2017-12-22 35638, 2017

      • alastairp
        :D
      • 2017-12-22 35647, 2017

      • iliekcomputers
        (I'm very inexperienced and I might have to endure laughter 10 years later too) :D
      • 2017-12-22 35657, 2017

      • ruaok
        iliekcomputers: I'm trying to save you. :)
      • 2017-12-22 35641, 2017

      • iliekcomputers
        :)
      • 2017-12-22 35657, 2017

      • iliekcomputers
        so could just use a filename-sanitization lib and keep the username in the json explicitly
      • 2017-12-22 35659, 2017

      • iliekcomputers
        ?
      • 2017-12-22 35606, 2017

      • alastairp
        do we need to use usernames as filenames?
      • 2017-12-22 35613, 2017

      • alastairp
        database ids? uuids?
      • 2017-12-22 35636, 2017

      • ruaok
        it certainly us more user friendly for casual users.
      • 2017-12-22 35644, 2017

      • iliekcomputers
        I was thinking of readability
      • 2017-12-22 35653, 2017

      • ruaok
        yeah.
      • 2017-12-22 35658, 2017

      • alastairp
        yeah
      • 2017-12-22 35605, 2017

      • alastairp
        is this a full dump of all listens?
      • 2017-12-22 35608, 2017

      • ruaok
        yes
      • 2017-12-22 35613, 2017

      • alastairp
        right
      • 2017-12-22 35626, 2017

      • alastairp
        so, what would the solution to user aux be anyway?
      • 2017-12-22 35629, 2017

      • heyoni has quit
      • 2017-12-22 35632, 2017

      • alastairp
        you can't quote that
      • 2017-12-22 35645, 2017

      • alastairp
        you have to then tell people that there is /some/ process to do (for some usernames)
      • 2017-12-22 35651, 2017

      • ruaok
        delete them from the database.
      • 2017-12-22 35659, 2017

      • iliekcomputers
        lolol
      • 2017-12-22 35600, 2017

      • ruaok
        serves them right for choosing such bad names.
      • 2017-12-22 35601, 2017

      • alastairp
        if they have to do a process, why not just make it mandatory for all usernames
      • 2017-12-22 35632, 2017

      • alastairp
        I do agree that you lose the niceness of "open the folder of the username whose listens you want to see"
      • 2017-12-22 35649, 2017

      • ruaok
        what if we split the difference and make UUID filenames and then have a JSON "index" file that makes a simple cross reference?
      • 2017-12-22 35612, 2017

      • alastairp
        I was going to suggest a crossreference file
      • 2017-12-22 35613, 2017

      • ruaok
        UUID or rowid or something. something easy.
      • 2017-12-22 35629, 2017

      • ruaok
        iliekcomputers: what do you think about the xref file?
      • 2017-12-22 35630, 2017

      • alastairp
        if it's uuids, you should ask musicbrainz [oauth master] for those :-P
      • 2017-12-22 35638, 2017

      • ruaok
        should be cake to put out there, no?
      • 2017-12-22 35644, 2017

      • alastairp
        in which case, I'd suggest rowid
      • 2017-12-22 35651, 2017

      • ruaok
        lol, yes. :)
      • 2017-12-22 35657, 2017

      • alastairp
        also consider number of items per directory
      • 2017-12-22 35602, 2017

      • alastairp
        please keep it under 10,000
      • 2017-12-22 35613, 2017

      • alastairp
        so, data/uid%1000/uid/data
      • 2017-12-22 35614, 2017

      • alastairp
        or something
      • 2017-12-22 35615, 2017

      • ruaok
        even less than that.
      • 2017-12-22 35619, 2017

      • ruaok
        yeah, good point.
      • 2017-12-22 35632, 2017

      • iliekcomputers
        xref file sounds good
      • 2017-12-22 35658, 2017

      • iliekcomputers
        the number of items per directory thing is something I didn't think of, I'll do that too.
      • 2017-12-22 35600, 2017

      • ruaok
        I would also go with user-%06d.listens for a filename pattern.
      • 2017-12-22 35619, 2017

      • alastairp
        so pesimistic
      • 2017-12-22 35625, 2017

      • alastairp
        only 1 million users?
      • 2017-12-22 35627, 2017

      • ruaok
        %24d ?
      • 2017-12-22 35627, 2017

      • Fanshawe
        ruaok: Error: "24d" is not a valid command.
      • 2017-12-22 35602, 2017

      • ruaok
        %dafuq?
      • 2017-12-22 35602, 2017

      • Fanshawe
        ruaok: Error: "dafuq?" is not a valid command.
      • 2017-12-22 35634, 2017

      • alastairp
        (in fact, there are already 2m editors in MB)
      • 2017-12-22 35640, 2017

      • alastairp
        10 digits, perhaps?
      • 2017-12-22 35652, 2017

      • iliekcomputers
        Bots we didn't even know we're listening😌🙄
      • 2017-12-22 35654, 2017

      • alastairp
        I guess it doesn't really matter, jsut 0-pad to a point, and if we go over they go longer?
      • 2017-12-22 35619, 2017

      • ruaok
        future proofing can really get silly fast, no?
      • 2017-12-22 35601, 2017

      • ruaok
        alastairp: if we go over 10M users, we'll have had to restructure these dumps.
      • 2017-12-22 35627, 2017

      • alastairp
        good point
      • 2017-12-22 35630, 2017

      • ruaok
        let's go with %06d
      • 2017-12-22 35640, 2017

      • ruaok
        more than 1M users and this isn't going to work.
      • 2017-12-22 35613, 2017

      • ruaok
        right then, now that we've got that sorted. what color should the dumps be?
      • 2017-12-22 35620, 2017

      • iliekcomputers
        i vote pink
      • 2017-12-22 35623, 2017

      • alastairp
        <3
      • 2017-12-22 35627, 2017

      • ruaok
        mauve? periwinkle? bunny-dust?
      • 2017-12-22 35631, 2017

      • alastairp
        I know lots of things about dumping files
      • 2017-12-22 35638, 2017

      • alastairp
        uh, I mean to say
      • 2017-12-22 35639, 2017

      • ruaok
        ass-gasket-grey?
      • 2017-12-22 35651, 2017

      • iliekcomputers
        so final structure is data/uid%1000/user-%06d.listens, yes?
      • 2017-12-22 35652, 2017

      • alastairp
        I know nothing about dumping files, but it seems like something that I could knock out in a weekend
      • 2017-12-22 35655, 2017

      • alastairp
        listen to me
      • 2017-12-22 35613, 2017

      • alastairp
        that'll give us 1000 items in 1000 directories
      • 2017-12-22 35615, 2017

      • alastairp
        I like it
      • 2017-12-22 35615, 2017

      • ruaok
        alastairp: lol
      • 2017-12-22 35620, 2017

      • iliekcomputers
        alastairp: next is AB dumps :D
      • 2017-12-22 35624, 2017

      • alastairp
        shhhh
      • 2017-12-22 35635, 2017

      • ruaok
        team of smartasses.
      • 2017-12-22 35639, 2017

      • alastairp
        iliekcomputers: don't get your hopes up, I'm on holiday until the 8th
      • 2017-12-22 35657, 2017

      • iliekcomputers
        soon...
      • 2017-12-22 35659, 2017

      • iliekcomputers
        :D
      • 2017-12-22 35604, 2017

      • iliekcomputers
        ruaok: :D
      • 2017-12-22 35627, 2017

      • ruaok
        hmmm, how do you hash filenames with small numbers?
      • 2017-12-22 35656, 2017

      • ruaok
        UUIDs hash really well.
      • 2017-12-22 35610, 2017

      • alastairp
        what do you mean "hash"?
      • 2017-12-22 35628, 2017

      • naiveai
        finally!
      • 2017-12-22 35631, 2017

      • ruaok
        hash into subdirectories that have roughly even numbers of items in them.
      • 2017-12-22 35633, 2017

      • naiveai
        visulization website is up!
      • 2017-12-22 35643, 2017

      • naiveai
      • 2017-12-22 35648, 2017

      • naiveai
        LordSputnik, Leftmost: ^
      • 2017-12-22 35654, 2017

      • alastairp
        you use remainder instead of selecting a character of the number as a string
      • 2017-12-22 35600, 2017

      • alastairp
        so 1-999 go in the directory 0000
      • 2017-12-22 35625, 2017

      • alastairp
        wait, I don't think that's exactly right
      • 2017-12-22 35629, 2017

      • alastairp
        no, ignore me
      • 2017-12-22 35638, 2017

      • ruaok
        naiveai: well done!
      • 2017-12-22 35650, 2017

      • ruaok hopes alastairp will continue
      • 2017-12-22 35656, 2017

      • alastairp
        ruaok: you 0-pad your remainders
      • 2017-12-22 35615, 2017

      • alastairp
        so, %1000 gives you remainders from 0 to 999
      • 2017-12-22 35634, 2017

      • ruaok
        so, the last 4 digits of the filename, sans extension, basically?
      • 2017-12-22 35634, 2017

      • alastairp
        which are your top level directories, padded to 3 or 4 characters
      • 2017-12-22 35659, 2017

      • alastairp
        right
      • 2017-12-22 35601, 2017

      • github joined the channel
      • 2017-12-22 35601, 2017

      • github
        [sir] samj1912 opened pull request #72: SOLR-77: Fix the way FK rels were handled (master...fixfk) https://git.io/vbHeh
      • 2017-12-22 35601, 2017

      • github has left the channel
      • 2017-12-22 35630, 2017

      • samj1912
        and that was the last bug from current batch of testing \o/
      • 2017-12-22 35629, 2017

      • alastairp
        I hope we remember that exposing user row-ids leaks internal structure. However, since the data is public I don't think that we have any possible attack vectors
      • 2017-12-22 35602, 2017

      • ruaok
        famous last words, samj1912
      • 2017-12-22 35619, 2017

      • samj1912
        lol :P
      • 2017-12-22 35635, 2017

      • alastairp
        yeah, really sounds to me like editors should have their own uuids
      • 2017-12-22 35636, 2017

      • ruaok
        alastairp: in that case we can use the Auth token UUIDs. We've already got them.
      • 2017-12-22 35640, 2017

      • ruaok snickers
      • 2017-12-22 35653, 2017

      • alastairp
        ruaok: sounds like you dropped the ball 15 years ago
      • 2017-12-22 35658, 2017

      • alastairp
        didn't think that one through
      • 2017-12-22 35621, 2017

      • ruaok
        my current record on being wrong on something is 20 years.
      • 2017-12-22 35624, 2017

      • ruaok is proud
      • 2017-12-22 35620, 2017

      • ruaok
        more seriously, the UUID provided from oauth -- can we use that safely or is that a security problem?
      • 2017-12-22 35630, 2017

      • ruaok
        I don't want to generate yet another number.
      • 2017-12-22 35653, 2017

      • ruaok
        then we'd have to store them, otherwise the ids won't be consistent between dumps.
      • 2017-12-22 35609, 2017

      • alastairp
        what do you mean? we make MB give an oauth when we auth and use that?
      • 2017-12-22 35620, 2017

      • alastairp
        that's kind of a solution to the deleted user stuff that we went through
      • 2017-12-22 35627, 2017

      • ruaok
        yep. I think it is already in the DB.
      • 2017-12-22 35638, 2017

      • alastairp
        if it is, then I think that's a pretty nice idea
      • 2017-12-22 35646, 2017

      • alastairp
        not sure if it's exposed though
      • 2017-12-22 35649, 2017

      • iliekcomputers
        MB provides a uuid during oauth?
      • 2017-12-22 35653, 2017

      • xps2 has quit
      • 2017-12-22 35657, 2017

      • alastairp
        (I'm pretty sure it doesn't)
      • 2017-12-22 35603, 2017

      • alastairp
        but check with bitmap
      • 2017-12-22 35639, 2017

      • ruaok
        yeah, no. we certainly don't store it.
      • 2017-12-22 35649, 2017

      • ruaok
        so, unless we go and give each user a UUID and store it, this won't work.
      • 2017-12-22 35601, 2017

      • ruaok
        we've identified a lot of things that don't work so far. go us.
      • 2017-12-22 35634, 2017

      • iliekcomputers
        anyways, if we were just gonna use uuids, using them for measurement names would have been a good idea too
      • 2017-12-22 35652, 2017

      • ruaok
        saved you some sanity, for sure.
      • 2017-12-22 35600, 2017

      • alastairp
        oh yeah, heh
      • 2017-12-22 35609, 2017

      • iliekcomputers
        :)
      • 2017-12-22 35615, 2017

      • alastairp
        anyway, listenbrainz NGS can use them
      • 2017-12-22 35618, 2017

      • ruaok
        so, what is our conclusion
      • 2017-12-22 35619, 2017

      • ruaok
        ?
      • 2017-12-22 35622, 2017

      • Leftmost
        LordSputnik, as it is, it makes no sense to me how the identifier editor populates existing rows.
      • 2017-12-22 35606, 2017

      • alastairp
        use row ids, multiple directories, consider adding uuids when we move oauth to metabrainz.org
      • 2017-12-22 35644, 2017

      • ruaok
        alastairp: +1
      • 2017-12-22 35648, 2017

      • ruaok
        iliekcomputers: bake it so?
      • 2017-12-22 35652, 2017

      • iliekcomputers
        works for me
      • 2017-12-22 35645, 2017

      • iliekcomputers
        gonna close gh:LB#325 for now
      • 2017-12-22 35645, 2017

      • BrainzBot
        LB-268: Escape usernames before creating files in dumps: https://github.com/metabrainz/listenbrainz-server…
      • 2017-12-22 35639, 2017

      • github joined the channel
      • 2017-12-22 35639, 2017

      • github
        [listenbrainz-server] paramsingh closed pull request #325: LB-268: Escape usernames before creating files in dumps (master...escape-username-dumps) https://git.io/vb9xN
      • 2017-12-22 35639, 2017

      • github has left the channel
      • 2017-12-22 35603, 2017

      • samj1912
        yvanzo: bitmap do we have any FK contraints in MBDB which are made of more than 1 columns?
      • 2017-12-22 35609, 2017

      • samj1912
        apparently not, nvm