#musicbrainz-devel

/

13:00 PM
ruaok

there two people who I never bank on to be awake. or asleep. :)

2014-01-08 00813, 2014

13:00 PM
ianmcorvidae

heh

2014-01-08 00814, 2014

13:00 PM
ruaok

lets chat about geordi for a sec.

2014-01-08 00826, 2014

13:00 PM
nikki

me and ian? :P

2014-01-08 00828, 2014

13:00 PM
ruaok

I know you're unhappy with some of your design decisions.

2014-01-08 00831, 2014

13:00 PM
ruaok

nikki: duh. :)

2014-01-08 00859, 2014

13:00 PM
ruaok

I'm thinking it might be nice to spend time on geordi between now and chi-town meeting.

2014-01-08 00817, 2014

13:01 PM
ruaok

I wonder if we can get geordi updated and loaded with nijatune data.

2014-01-08 00840, 2014

13:01 PM
ruaok

and then leave your schedule to be free to focus on the things we decide on in chicago.

2014-01-08 00802, 2014

13:02 PM
ruaok

because there are people willing to give us data, but we have no place to put said data.

2014-01-08 00819, 2014

13:02 PM
ruaok

and that stops all forward motion for a large chunk of my efforts.

2014-01-08 00827, 2014

13:02 PM
ianmcorvidae

yeah.

2014-01-08 00800, 2014

13:03 PM
ruaok

how much effort would it be to bring geordi to a minimally viable product?

2014-01-08 00815, 2014

13:03 PM
ianmcorvidae

I don't disagree; I made an attempt to work on it some in december and didn't have much success, mostly because I wasn't doing well focusing

2014-01-08 00829, 2014

13:03 PM
Nyanko-sensei joined the channel

2014-01-08 00830, 2014

13:03 PM
ianmcorvidae

I'm not sure I know the answer to that, though :)

2014-01-08 00835, 2014

13:03 PM
ruaok

is that something I can help with?

2014-01-08 00859, 2014

13:03 PM
ruaok

ah, is it a sort of morass that you don't want to deal with, so you find ways to procrastinate?

2014-01-08 00811, 2014

13:04 PM
ianmcorvidae

yeah, that

2014-01-08 00805, 2014

13:05 PM
ianmcorvidae

I don't have a good idea of how to do it in pieces, I guess

2014-01-08 00828, 2014

13:05 PM
ianmcorvidae

nikki and I were talking the other month and came up with a rough notion at http://lmfao.org.uk/geordipriorities.png

2014-01-08 00847, 2014

13:05 PM
ianmcorvidae

but the dependencies are more complicated than that suggests

2014-01-08 00832, 2014

13:08 PM
ianmcorvidae

I did start a branch, and have some thinking-on-paper at https://github.com/metabrainz/geordi/blob/big-ref…

2014-01-08 00837, 2014

13:08 PM
ianmcorvidae

heh, so that's why he wasn't responding

2014-01-08 00843, 2014

13:08 PM
nikki

I should dig out the userscript I was working on and make it stop crashing opera :P

2014-01-08 00851, 2014

13:08 PM
ruaok joined the channel

2014-01-08 00854, 2014

13:08 PM
ruaok

feh

2014-01-08 00855, 2014

13:08 PM
ruaok

[2:06pm] ruaok: regarding #1. is that for importing into MB or into Geordi?

2014-01-08 00804, 2014

13:09 PM
nikki wonders what's up with ruaok's connection lately

2014-01-08 00810, 2014

13:09 PM
ianmcorvidae

into MB

2014-01-08 00823, 2014

13:09 PM
ruaok

nikki: its the week before the trimester here at UPF.

2014-01-08 00834, 2014

13:09 PM
ruaok

and not all staff is here. including the people who kick routers and things.

2014-01-08 00849, 2014

13:09 PM
ianmcorvidae

the main thing that is more clear to me is that it's best to consider geordi a souped-up importer tool, which at least got us as far as that diagram

2014-01-08 00811, 2014

13:10 PM
ianmcorvidae

after you left I mentioned that I did start a branch and some thinking on paper at https://github.com/metabrainz/geordi/blob/big-ref…

2014-01-08 00835, 2014

13:10 PM
ianmcorvidae

most of which is starting with 2/3 which are fairly related

2014-01-08 00832, 2014

13:11 PM
ianmcorvidae

I haven't quite figured out what amount of API needs to exist, but I think that geordi having access to MB's data that isn't through our crappy webservice is prerequisite for useful things like search by MBID

2014-01-08 00842, 2014

13:11 PM
ianmcorvidae

which is sort of why I'm starting with 3

2014-01-08 00817, 2014

13:12 PM
ianmcorvidae

but I'm sure you can see why it's feeling like a morass, if 2 and 1 in that thing depend on 3

2014-01-08 00828, 2014

13:13 PM
ianmcorvidae

anyway, I think that if I ignore other things (other than shipping some patches periodically, since I think I'm the one who has the appropriate access for that) I can devote time to working on it

2014-01-08 00818, 2014

13:14 PM
ianmcorvidae

bah

2014-01-08 00850, 2014

13:20 PM
ruaok_ joined the channel

2014-01-08 00855, 2014

13:20 PM
ruaok_

grrr.

2014-01-08 00803, 2014

13:21 PM
ianmcorvidae

http://chatlogs.musicbrainz.org/beta/%23musicbrai…

2014-01-08 00805, 2014

13:21 PM
ruaok_ uses the tether on his phone.

2014-01-08 00809, 2014

13:21 PM
reosarevok

ruaok_: use tet

2014-01-08 00812, 2014

13:21 PM
reosarevok

hahaha

2014-01-08 00814, 2014

13:21 PM
reosarevok

Good

2014-01-08 00828, 2014

13:21 PM
ianmcorvidae

and the main question for you is just: what do you need from geordi for it to stop blocking you, because that's probably the most important thing

2014-01-08 00844, 2014

13:21 PM
ianmcorvidae

and maybe that'll give me a way to focus exactly how I work on things

2014-01-08 00806, 2014

13:22 PM
ruaok_

the thing I care about most is importing existing data into geordi.

2014-01-08 00818, 2014

13:22 PM
ruaok_

such as the ninjatune data and soon other data.

2014-01-08 00851, 2014

13:22 PM
ianmcorvidae

okay -- to push the envelope, I assume you don't mean just to "visible as JSON"

2014-01-08 00851, 2014

13:22 PM
ruaok_

when you say new DB, are you intending to go relational?

2014-01-08 00857, 2014

13:22 PM
ianmcorvidae

partially

2014-01-08 00821, 2014

13:23 PM
ruaok_

yeah, I want the data to be queryable at minimum.

2014-01-08 00834, 2014

13:23 PM
ianmcorvidae

for things that are of a fairly static structure, like users, matches, that sort of thing

2014-01-08 00844, 2014

13:23 PM
ruaok_

and that would drive people to give us feedback for next steps

2014-01-08 00845, 2014

13:23 PM
ianmcorvidae

https://github.com/metabrainz/geordi/blob/big-ref… has the rough idea of what would exist

2014-01-08 00827, 2014

13:24 PM
ruaok_

ok, that looks sane.

2014-01-08 00841, 2014

13:24 PM
ianmcorvidae

so, queryable, but that doesn't include displaying as anything but JSON -- does it need to render things, or can I put that off at first and just dump pretty-printed JSON on a page?

2014-01-08 00847, 2014

13:24 PM
ruaok_

that still makes it easy for us to import new data without much hassle

2014-01-08 00801, 2014

13:25 PM
ruaok_

PP json is fine.

2014-01-08 00821, 2014

13:25 PM
ianmcorvidae

okay. that means I can ignore mapping things at first

2014-01-08 00843, 2014

13:25 PM
ruaok_

sure.

2014-01-08 00856, 2014

13:25 PM
ianmcorvidae

how queryable/matchable do you care about? is it important to have links between things in geordi and between geordi and MB, or are isolated documents okay?

2014-01-08 00830, 2014

13:26 PM
ruaok_

so then a priority list includes: 1. new DB, 2. Improved (non-PP json) display 3. mappings 4. import. Does that sound sane?

2014-01-08 00856, 2014

13:26 PM
ianmcorvidae

yes, but I'm wondering if we can't trim down/split up 1. a bit first

2014-01-08 00817, 2014

13:27 PM
ianmcorvidae

and 2/3 on that are the same, you need to map something before you can display it as anything but raw JSON :)

2014-01-08 00826, 2014

13:27 PM
ruaok_

I think linking inside geordi is less important that just being able to import a release.

2014-01-08 00834, 2014

13:27 PM
reosarevok assumed "mappings" meant to MB

2014-01-08 00839, 2014

13:27 PM
ianmcorvidae

heh, sorry

2014-01-08 00839, 2014

13:27 PM
reosarevok

But maybe not :)

2014-01-08 00851, 2014

13:27 PM
ianmcorvidae

mappings = from fields in geordi to fields in some sort of common structure used for display

2014-01-08 00856, 2014

13:27 PM
ianmcorvidae

matchings = geordi to MB

2014-01-08 00858, 2014

13:27 PM
ianmcorvidae

is the way I use the terms

2014-01-08 00805, 2014

13:28 PM
ruaok_

ah

2014-01-08 00817, 2014

13:28 PM
ianmcorvidae

I have an old document that explained that but I forget that others aren't as over-their-heads in this stuff as I am :)

2014-01-08 00818, 2014

13:28 PM
rvedotrc joined the channel

2014-01-08 00828, 2014

13:28 PM
ruaok_

db, mappings/display, matchings, import. that order?

2014-01-08 00843, 2014

13:28 PM
ianmcorvidae

unless you think matchings should go even lower than import

2014-01-08 00852, 2014

13:28 PM
reosarevok

IMO yes, since without matching you can't indicate you've imported

2014-01-08 00857, 2014

13:28 PM
ianmcorvidae

(note that mappings are also prerequisite for import, so those are certainly second)

2014-01-08 00821, 2014

13:29 PM
ruaok_

maybe I am not thinking of matchings in the right context.

2014-01-08 00840, 2014

13:29 PM
ruaok_

as in automated matchings between a geordi data store and MB?

2014-01-08 00846, 2014

13:29 PM
ruaok_

I think that ought to be last.

2014-01-08 00849, 2014

13:29 PM
ianmcorvidae

matchings are marking that something in geordi is the same as something in MB

2014-01-08 00852, 2014

13:29 PM
ianmcorvidae

whether manual or automatic

2014-01-08 00806, 2014

13:30 PM
ruaok_

ok, then that should be near the bottom

2014-01-08 00809, 2014

13:30 PM
ianmcorvidae

the big complication with those is that some matchings are only in geordi (wcd) and some are really derived from MB

2014-01-08 00814, 2014

13:30 PM
nikki

what needs to change with the display?

2014-01-08 00814, 2014

13:30 PM
reosarevok

So if I import something, I want to be able to tell geordi "this is here in MB"

2014-01-08 00821, 2014

13:30 PM
reosarevok

Since that lets others not import it

2014-01-08 00830, 2014

13:30 PM
reosarevok

(and not worry about it basically)

2014-01-08 00842, 2014

13:30 PM
reosarevok

Without that, I'm likely to forget *myself* what I've added and not

2014-01-08 00844, 2014

13:30 PM
ianmcorvidae

(where derived-from-MB is things like having a discogs URL in MB)

2014-01-08 00844, 2014

13:30 PM
reosarevok

:/

2014-01-08 00801, 2014

13:31 PM
ianmcorvidae

(or with ninjatune, it would presumably be something with the right label/catno)

2014-01-08 00815, 2014

13:31 PM
ruaok_

would it be enough to keep a paper trail for now and then improve the history of what has been imported later?

2014-01-08 00842, 2014

13:32 PM
ianmcorvidae

that would be writing manual matching stuff

2014-01-08 00848, 2014

13:32 PM
ianmcorvidae

but ignoring the MB-side matches

2014-01-08 00854, 2014

13:32 PM
reosarevok

That'd be fine for me

2014-01-08 00808, 2014

13:33 PM
reosarevok

(what ian said, not just throwing it at a wiki :p)

2014-01-08 00840, 2014

13:33 PM
ianmcorvidae

that's essentially what current geordi does, except I'd throw out the idea of automatic matches until we can do them right

2014-01-08 00858, 2014

13:33 PM
ruaok_ nods at the automatic matches

2014-01-08 00842, 2014

13:34 PM
reosarevok

Seems sensible

2014-01-08 00844, 2014

13:34 PM
ianmcorvidae

my plan for newgeordi has actually always been to throw out the external automatic match thing anyway, and have any automatic process be part of geordi

2014-01-08 00854, 2014

13:34 PM
ianmcorvidae

but the so-called "MB-side" matches are intermediate

2014-01-08 00801, 2014

13:35 PM
ianmcorvidae

but they can still wait until later

2014-01-08 00822, 2014

13:35 PM
ruaok_

not sure I quite grok the "mb-side" match stuff. what does that entail?

2014-01-08 00825, 2014

13:35 PM
ruaok_

MB knowing about geordi?

2014-01-08 00830, 2014

13:35 PM
ianmcorvidae

so: db, mappings, display of extracted info, basic manual matching, import of extracted info, reconvene

2014-01-08 00833, 2014

13:35 PM
ianmcorvidae

no

2014-01-08 00836, 2014

13:35 PM
ianmcorvidae

so

2014-01-08 00843, 2014

13:35 PM
ianmcorvidae

say with discogs

2014-01-08 00807, 2014

13:36 PM
ianmcorvidae

what a match in geordi from the discogs index is saying is, id XYZ in discogs is MBID ZYX in MB

2014-01-08 00825, 2014

13:36 PM
ianmcorvidae

however, in MB we have a relationship that says MBID ZYX in MB is id XYZ in discogs (via a URL relationship)

2014-01-08 00843, 2014

13:36 PM
ianmcorvidae

in current geordi we synchronize these by a really ridiculously hacky script that uses geordi's automatic matching system

2014-01-08 00859, 2014

13:36 PM
ianmcorvidae

the better way to do it would be to have a replicated DB and just query them :P

2014-01-08 00809, 2014

13:37 PM
reosarevok

mhmh

2014-01-08 00828, 2014

13:37 PM
ianmcorvidae

since the ultimate goal of geordi is to get info into MB, the notion here is that it's better to store as many matchings (in the geordi sense) as relationships in MB as we can

2014-01-08 00842, 2014

13:37 PM
ianmcorvidae

rather than storing them in geordi where they're not much use to anyone who isn't trying to import things from geordi

2014-01-08 00858, 2014

13:37 PM
ruaok_

yeah, that should be considered during the reconvene.

2014-01-08 00810, 2014

13:38 PM
ianmcorvidae

yeah.

2014-01-08 00820, 2014

13:38 PM
ianmcorvidae

the other thing I'll do is ignore the wcd index at first, I think

2014-01-08 00826, 2014

13:38 PM
ruaok_ nods at ianmcorvidae

2014-01-08 00838, 2014

13:38 PM
ruaok_

agreed.

2014-01-08 00842, 2014

13:38 PM
ianmcorvidae

since it has complications of embedded matches from the IA's matching process, and is weird data in general

2014-01-08 00850, 2014

13:38 PM
ruaok_

personally I think ninjatune ought to be your first data set to work with.

2014-01-08 00854, 2014

13:38 PM
ianmcorvidae

and, well, designing based on the wcd data is basically what got us where we are

2014-01-08 00804, 2014

13:39 PM
ianmcorvidae

discogs. because it has concepts of more than one type of entity

2014-01-08 00806, 2014

13:39 PM
ruaok_

oh, ouch.

2014-01-08 00818, 2014

13:39 PM
reosarevok

ruaok_: ninjatune and discogs IMO (as in, the design should ensure it works fine for both kinds of data)

2014-01-08 00825, 2014

13:39 PM
ianmcorvidae

I mean, the 'designing' (i.e. me) is more at fault than the data, but still :P

2014-01-08 00835, 2014

13:39 PM
reosarevok

Label info and discogs are probably going to be the two main sources of data after all

2014-01-08 00835, 2014

13:39 PM
ruaok_

reosarevok: yeah, agreed. lets make sure it works for both of those.

2014-01-08 00835, 2014

13:39 PM
ianmcorvidae

I think both is the right answer anyway, yeah

2014-01-08 00850, 2014

13:39 PM
ruaok_

but I say ninjatune because this is an industry relationship that we're lagging on.

2014-01-08 00815, 2014

13:40 PM
ianmcorvidae

the lucky thing here is that ninjatune is super easy when we exclude the mb-side matches part :)

2014-01-08 00823, 2014

13:40 PM
ianmcorvidae

which we have

2014-01-08 00839, 2014

13:40 PM
reosarevok

:)

2014-01-08 00842, 2014

13:40 PM
ruaok_

super easy? even with needing a new DB?

2014-01-08 00848, 2014

13:40 PM
ruaok_

or are you talking about the overall matching process?

2014-01-08 00800, 2014

13:41 PM
ruaok_

granted, there is not much data and we have large chunks of it.

2014-01-08 00805, 2014

13:41 PM
ianmcorvidae

super easy in terms of if it works for discogs stuff it'll work for ninjatune

2014-01-08 00811, 2014

13:41 PM
ruaok_

I see it more from a political perspective.

2014-01-08 00817, 2014

13:41 PM
ruaok_

got it.

2014-01-08 00828, 2014

13:41 PM
ianmcorvidae

where if we were doing the mb-side matches those would require ninjatune-specific implementation

2014-01-08 00853, 2014

13:41 PM
ruaok_

ok, I think we have a rough idea where we want to go to from here.

2014-01-08 00803, 2014

13:42 PM
ruaok_

lets see how much of this stuff we can get done before chicago.

2014-01-08 00811, 2014

13:42 PM
ruaok_

and by we, I mean you. ;)