#musicbrainz-devel

/

      • ruaok
        there two people who I never bank on to be awake. or asleep. :)
      • 2014-01-08 00813, 2014

      • ianmcorvidae
        heh
      • 2014-01-08 00814, 2014

      • ruaok
        lets chat about geordi for a sec.
      • 2014-01-08 00826, 2014

      • nikki
        me and ian? :P
      • 2014-01-08 00828, 2014

      • ruaok
        I know you're unhappy with some of your design decisions.
      • 2014-01-08 00831, 2014

      • ruaok
        nikki: duh. :)
      • 2014-01-08 00859, 2014

      • ruaok
        I'm thinking it might be nice to spend time on geordi between now and chi-town meeting.
      • 2014-01-08 00817, 2014

      • ruaok
        I wonder if we can get geordi updated and loaded with nijatune data.
      • 2014-01-08 00840, 2014

      • ruaok
        and then leave your schedule to be free to focus on the things we decide on in chicago.
      • 2014-01-08 00802, 2014

      • ruaok
        because there are people willing to give us data, but we have no place to put said data.
      • 2014-01-08 00819, 2014

      • ruaok
        and that stops all forward motion for a large chunk of my efforts.
      • 2014-01-08 00827, 2014

      • ianmcorvidae
        yeah.
      • 2014-01-08 00800, 2014

      • ruaok
        how much effort would it be to bring geordi to a minimally viable product?
      • 2014-01-08 00815, 2014

      • ianmcorvidae
        I don't disagree; I made an attempt to work on it some in december and didn't have much success, mostly because I wasn't doing well focusing
      • 2014-01-08 00829, 2014

      • Nyanko-sensei joined the channel
      • 2014-01-08 00830, 2014

      • ianmcorvidae
        I'm not sure I know the answer to that, though :)
      • 2014-01-08 00835, 2014

      • ruaok
        is that something I can help with?
      • 2014-01-08 00859, 2014

      • ruaok
        ah, is it a sort of morass that you don't want to deal with, so you find ways to procrastinate?
      • 2014-01-08 00811, 2014

      • ianmcorvidae
        yeah, that
      • 2014-01-08 00805, 2014

      • ianmcorvidae
        I don't have a good idea of how to do it in pieces, I guess
      • 2014-01-08 00828, 2014

      • ianmcorvidae
        nikki and I were talking the other month and came up with a rough notion at http://lmfao.org.uk/geordipriorities.png
      • 2014-01-08 00847, 2014

      • ianmcorvidae
        but the dependencies are more complicated than that suggests
      • 2014-01-08 00832, 2014

      • ianmcorvidae
        I did start a branch, and have some thinking-on-paper at https://github.com/metabrainz/geordi/blob/big-ref…
      • 2014-01-08 00837, 2014

      • ianmcorvidae
        heh, so that's why he wasn't responding
      • 2014-01-08 00843, 2014

      • nikki
        I should dig out the userscript I was working on and make it stop crashing opera :P
      • 2014-01-08 00851, 2014

      • ruaok joined the channel
      • 2014-01-08 00854, 2014

      • ruaok
        feh
      • 2014-01-08 00855, 2014

      • ruaok
        [2:06pm] ruaok: regarding #1. is that for importing into MB or into Geordi?
      • 2014-01-08 00804, 2014

      • nikki wonders what's up with ruaok's connection lately
      • 2014-01-08 00810, 2014

      • ianmcorvidae
        into MB
      • 2014-01-08 00823, 2014

      • ruaok
        nikki: its the week before the trimester here at UPF.
      • 2014-01-08 00834, 2014

      • ruaok
        and not all staff is here. including the people who kick routers and things.
      • 2014-01-08 00849, 2014

      • ianmcorvidae
        the main thing that is more clear to me is that it's best to consider geordi a souped-up importer tool, which at least got us as far as that diagram
      • 2014-01-08 00811, 2014

      • ianmcorvidae
        after you left I mentioned that I did start a branch and some thinking on paper at https://github.com/metabrainz/geordi/blob/big-ref…
      • 2014-01-08 00835, 2014

      • ianmcorvidae
        most of which is starting with 2/3 which are fairly related
      • 2014-01-08 00832, 2014

      • ianmcorvidae
        I haven't quite figured out what amount of API needs to exist, but I think that geordi having access to MB's data that isn't through our crappy webservice is prerequisite for useful things like search by MBID
      • 2014-01-08 00842, 2014

      • ianmcorvidae
        which is sort of why I'm starting with 3
      • 2014-01-08 00817, 2014

      • ianmcorvidae
        but I'm sure you can see why it's feeling like a morass, if 2 and 1 in that thing depend on 3
      • 2014-01-08 00828, 2014

      • ianmcorvidae
        anyway, I think that if I ignore other things (other than shipping some patches periodically, since I think I'm the one who has the appropriate access for that) I can devote time to working on it
      • 2014-01-08 00818, 2014

      • ianmcorvidae
        bah
      • 2014-01-08 00850, 2014

      • ruaok_ joined the channel
      • 2014-01-08 00855, 2014

      • ruaok_
        grrr.
      • 2014-01-08 00803, 2014

      • ianmcorvidae
      • 2014-01-08 00805, 2014

      • ruaok_ uses the tether on his phone.
      • 2014-01-08 00809, 2014

      • reosarevok
        ruaok_: use tet
      • 2014-01-08 00812, 2014

      • reosarevok
        hahaha
      • 2014-01-08 00814, 2014

      • reosarevok
        Good
      • 2014-01-08 00828, 2014

      • ianmcorvidae
        and the main question for you is just: what do you need from geordi for it to stop blocking you, because that's probably the most important thing
      • 2014-01-08 00844, 2014

      • ianmcorvidae
        and maybe that'll give me a way to focus exactly how I work on things
      • 2014-01-08 00806, 2014

      • ruaok_
        the thing I care about most is importing existing data into geordi.
      • 2014-01-08 00818, 2014

      • ruaok_
        such as the ninjatune data and soon other data.
      • 2014-01-08 00851, 2014

      • ianmcorvidae
        okay -- to push the envelope, I assume you don't mean just to "visible as JSON"
      • 2014-01-08 00851, 2014

      • ruaok_
        when you say new DB, are you intending to go relational?
      • 2014-01-08 00857, 2014

      • ianmcorvidae
        partially
      • 2014-01-08 00821, 2014

      • ruaok_
        yeah, I want the data to be queryable at minimum.
      • 2014-01-08 00834, 2014

      • ianmcorvidae
        for things that are of a fairly static structure, like users, matches, that sort of thing
      • 2014-01-08 00844, 2014

      • ruaok_
        and that would drive people to give us feedback for next steps
      • 2014-01-08 00845, 2014

      • ianmcorvidae
        https://github.com/metabrainz/geordi/blob/big-ref… has the rough idea of what would exist
      • 2014-01-08 00827, 2014

      • ruaok_
        ok, that looks sane.
      • 2014-01-08 00841, 2014

      • ianmcorvidae
        so, queryable, but that doesn't include displaying as anything but JSON -- does it need to render things, or can I put that off at first and just dump pretty-printed JSON on a page?
      • 2014-01-08 00847, 2014

      • ruaok_
        that still makes it easy for us to import new data without much hassle
      • 2014-01-08 00801, 2014

      • ruaok_
        PP json is fine.
      • 2014-01-08 00821, 2014

      • ianmcorvidae
        okay. that means I can ignore mapping things at first
      • 2014-01-08 00843, 2014

      • ruaok_
        sure.
      • 2014-01-08 00856, 2014

      • ianmcorvidae
        how queryable/matchable do you care about? is it important to have links between things in geordi and between geordi and MB, or are isolated documents okay?
      • 2014-01-08 00830, 2014

      • ruaok_
        so then a priority list includes: 1. new DB, 2. Improved (non-PP json) display 3. mappings 4. import. Does that sound sane?
      • 2014-01-08 00856, 2014

      • ianmcorvidae
        yes, but I'm wondering if we can't trim down/split up 1. a bit first
      • 2014-01-08 00817, 2014

      • ianmcorvidae
        and 2/3 on that are the same, you need to map something before you can display it as anything but raw JSON :)
      • 2014-01-08 00826, 2014

      • ruaok_
        I think linking inside geordi is less important that just being able to import a release.
      • 2014-01-08 00834, 2014

      • reosarevok assumed "mappings" meant to MB
      • 2014-01-08 00839, 2014

      • ianmcorvidae
        heh, sorry
      • 2014-01-08 00839, 2014

      • reosarevok
        But maybe not :)
      • 2014-01-08 00851, 2014

      • ianmcorvidae
        mappings = from fields in geordi to fields in some sort of common structure used for display
      • 2014-01-08 00856, 2014

      • ianmcorvidae
        matchings = geordi to MB
      • 2014-01-08 00858, 2014

      • ianmcorvidae
        is the way I use the terms
      • 2014-01-08 00805, 2014

      • ruaok_
        ah
      • 2014-01-08 00817, 2014

      • ianmcorvidae
        I have an old document that explained that but I forget that others aren't as over-their-heads in this stuff as I am :)
      • 2014-01-08 00818, 2014

      • rvedotrc joined the channel
      • 2014-01-08 00828, 2014

      • ruaok_
        db, mappings/display, matchings, import. that order?
      • 2014-01-08 00843, 2014

      • ianmcorvidae
        unless you think matchings should go even lower than import
      • 2014-01-08 00852, 2014

      • reosarevok
        IMO yes, since without matching you can't indicate you've imported
      • 2014-01-08 00857, 2014

      • ianmcorvidae
        (note that mappings are also prerequisite for import, so those are certainly second)
      • 2014-01-08 00821, 2014

      • ruaok_
        maybe I am not thinking of matchings in the right context.
      • 2014-01-08 00840, 2014

      • ruaok_
        as in automated matchings between a geordi data store and MB?
      • 2014-01-08 00846, 2014

      • ruaok_
        I think that ought to be last.
      • 2014-01-08 00849, 2014

      • ianmcorvidae
        matchings are marking that something in geordi is the same as something in MB
      • 2014-01-08 00852, 2014

      • ianmcorvidae
        whether manual or automatic
      • 2014-01-08 00806, 2014

      • ruaok_
        ok, then that should be near the bottom
      • 2014-01-08 00809, 2014

      • ianmcorvidae
        the big complication with those is that some matchings are only in geordi (wcd) and some are really derived from MB
      • 2014-01-08 00814, 2014

      • nikki
        what needs to change with the display?
      • 2014-01-08 00814, 2014

      • reosarevok
        So if I import something, I want to be able to tell geordi "this is here in MB"
      • 2014-01-08 00821, 2014

      • reosarevok
        Since that lets others not import it
      • 2014-01-08 00830, 2014

      • reosarevok
        (and not worry about it basically)
      • 2014-01-08 00842, 2014

      • reosarevok
        Without that, I'm likely to forget *myself* what I've added and not
      • 2014-01-08 00844, 2014

      • ianmcorvidae
        (where derived-from-MB is things like having a discogs URL in MB)
      • 2014-01-08 00844, 2014

      • reosarevok
        :/
      • 2014-01-08 00801, 2014

      • ianmcorvidae
        (or with ninjatune, it would presumably be something with the right label/catno)
      • 2014-01-08 00815, 2014

      • ruaok_
        would it be enough to keep a paper trail for now and then improve the history of what has been imported later?
      • 2014-01-08 00842, 2014

      • ianmcorvidae
        that would be writing manual matching stuff
      • 2014-01-08 00848, 2014

      • ianmcorvidae
        but ignoring the MB-side matches
      • 2014-01-08 00854, 2014

      • reosarevok
        That'd be fine for me
      • 2014-01-08 00808, 2014

      • reosarevok
        (what ian said, not just throwing it at a wiki :p)
      • 2014-01-08 00840, 2014

      • ianmcorvidae
        that's essentially what current geordi does, except I'd throw out the idea of automatic matches until we can do them right
      • 2014-01-08 00858, 2014

      • ruaok_ nods at the automatic matches
      • 2014-01-08 00842, 2014

      • reosarevok
        Seems sensible
      • 2014-01-08 00844, 2014

      • ianmcorvidae
        my plan for newgeordi has actually always been to throw out the external automatic match thing anyway, and have any automatic process be part of geordi
      • 2014-01-08 00854, 2014

      • ianmcorvidae
        but the so-called "MB-side" matches are intermediate
      • 2014-01-08 00801, 2014

      • ianmcorvidae
        but they can still wait until later
      • 2014-01-08 00822, 2014

      • ruaok_
        not sure I quite grok the "mb-side" match stuff. what does that entail?
      • 2014-01-08 00825, 2014

      • ruaok_
        MB knowing about geordi?
      • 2014-01-08 00830, 2014

      • ianmcorvidae
        so: db, mappings, display of extracted info, basic manual matching, import of extracted info, reconvene
      • 2014-01-08 00833, 2014

      • ianmcorvidae
        no
      • 2014-01-08 00836, 2014

      • ianmcorvidae
        so
      • 2014-01-08 00843, 2014

      • ianmcorvidae
        say with discogs
      • 2014-01-08 00807, 2014

      • ianmcorvidae
        what a match in geordi from the discogs index is saying is, id XYZ in discogs is MBID ZYX in MB
      • 2014-01-08 00825, 2014

      • ianmcorvidae
        however, in MB we have a relationship that says MBID ZYX in MB is id XYZ in discogs (via a URL relationship)
      • 2014-01-08 00843, 2014

      • ianmcorvidae
        in current geordi we synchronize these by a really ridiculously hacky script that uses geordi's automatic matching system
      • 2014-01-08 00859, 2014

      • ianmcorvidae
        the better way to do it would be to have a replicated DB and just query them :P
      • 2014-01-08 00809, 2014

      • reosarevok
        mhmh
      • 2014-01-08 00828, 2014

      • ianmcorvidae
        since the ultimate goal of geordi is to get info into MB, the notion here is that it's better to store as many matchings (in the geordi sense) as relationships in MB as we can
      • 2014-01-08 00842, 2014

      • ianmcorvidae
        rather than storing them in geordi where they're not much use to anyone who isn't trying to import things from geordi
      • 2014-01-08 00858, 2014

      • ruaok_
        yeah, that should be considered during the reconvene.
      • 2014-01-08 00810, 2014

      • ianmcorvidae
        yeah.
      • 2014-01-08 00820, 2014

      • ianmcorvidae
        the other thing I'll do is ignore the wcd index at first, I think
      • 2014-01-08 00826, 2014

      • ruaok_ nods at ianmcorvidae
      • 2014-01-08 00838, 2014

      • ruaok_
        agreed.
      • 2014-01-08 00842, 2014

      • ianmcorvidae
        since it has complications of embedded matches from the IA's matching process, and is weird data in general
      • 2014-01-08 00850, 2014

      • ruaok_
        personally I think ninjatune ought to be your first data set to work with.
      • 2014-01-08 00854, 2014

      • ianmcorvidae
        and, well, designing based on the wcd data is basically what got us where we are
      • 2014-01-08 00804, 2014

      • ianmcorvidae
        discogs. because it has concepts of more than one type of entity
      • 2014-01-08 00806, 2014

      • ruaok_
        oh, ouch.
      • 2014-01-08 00818, 2014

      • reosarevok
        ruaok_: ninjatune and discogs IMO (as in, the design should ensure it works fine for both kinds of data)
      • 2014-01-08 00825, 2014

      • ianmcorvidae
        I mean, the 'designing' (i.e. me) is more at fault than the data, but still :P
      • 2014-01-08 00835, 2014

      • reosarevok
        Label info and discogs are probably going to be the two main sources of data after all
      • 2014-01-08 00835, 2014

      • ruaok_
        reosarevok: yeah, agreed. lets make sure it works for both of those.
      • 2014-01-08 00835, 2014

      • ianmcorvidae
        I think both is the right answer anyway, yeah
      • 2014-01-08 00850, 2014

      • ruaok_
        but I say ninjatune because this is an industry relationship that we're lagging on.
      • 2014-01-08 00815, 2014

      • ianmcorvidae
        the lucky thing here is that ninjatune is super easy when we exclude the mb-side matches part :)
      • 2014-01-08 00823, 2014

      • ianmcorvidae
        which we have
      • 2014-01-08 00839, 2014

      • reosarevok
        :)
      • 2014-01-08 00842, 2014

      • ruaok_
        super easy? even with needing a new DB?
      • 2014-01-08 00848, 2014

      • ruaok_
        or are you talking about the overall matching process?
      • 2014-01-08 00800, 2014

      • ruaok_
        granted, there is not much data and we have large chunks of it.
      • 2014-01-08 00805, 2014

      • ianmcorvidae
        super easy in terms of if it works for discogs stuff it'll work for ninjatune
      • 2014-01-08 00811, 2014

      • ruaok_
        I see it more from a political perspective.
      • 2014-01-08 00817, 2014

      • ruaok_
        got it.
      • 2014-01-08 00828, 2014

      • ianmcorvidae
        where if we were doing the mb-side matches those would require ninjatune-specific implementation
      • 2014-01-08 00853, 2014

      • ruaok_
        ok, I think we have a rough idea where we want to go to from here.
      • 2014-01-08 00803, 2014

      • ruaok_
        lets see how much of this stuff we can get done before chicago.
      • 2014-01-08 00811, 2014

      • ruaok_
        and by we, I mean you. ;)