#musicbrainz-devel

/

16:46 PM
ocharles

damnit, why has no one done this work for me!

2012-11-02 30738, 2012

16:46 PM
ocharles

warp: go ahead

2012-11-02 30716, 2012

16:47 PM
warp

ocharles: great :D

2012-11-02 30744, 2012

16:48 PM
kepstin-work

the systemd timer stuff is annoying for things like logrotate on laptops that don't have long uptimes - if you have the timer set to every 12h starting from boot time and never leave it running for 12h, the timer will never trigger.

2012-11-02 30750, 2012

16:48 PM
kepstin-work

so cron is still useful.

2012-11-02 30754, 2012

16:48 PM
warp

winter :(

2012-11-02 30714, 2012

16:49 PM
warp

it feels as if it's 20:00 but it isn't even 18:00 yet.

2012-11-02 30735, 2012

16:49 PM
kepstin-work

apparently the DST change is this weekend here.

2012-11-02 30748, 2012

16:49 PM
warp nods.

2012-11-02 30702, 2012

16:50 PM
ocharles

kepstin-work: I suppose another option is to be super cloudy, and use one process that is fired on the hour to find open edits and send messages to a message queue

2012-11-02 30706, 2012

16:50 PM
ocharles

and another process that waits on the message queue

2012-11-02 30709, 2012

16:50 PM
kepstin-work

dunno if that'll make it better or worse :)

2012-11-02 30718, 2012

16:50 PM
ocharles

then you can have modbot farms if we suddenly need to process millions of edits an hour

2012-11-02 30719, 2012

16:50 PM
ocharles

:P

2012-11-02 30719, 2012

16:50 PM
warp

also I feel old and grumpy just for noticing + complaining about it :)

2012-11-02 30733, 2012

16:50 PM
ocharles

warp: it's 4:50pm here and dark outside :(

2012-11-02 30754, 2012

16:50 PM
kepstin-work

ocharles: heh, but that would act strangely if the queue gets backed up - edits will be in the queue multiple times.

2012-11-02 30712, 2012

16:51 PM
kepstin-work

i suppose just making the queue drop messages that can't be processed would be ok, tho.

2012-11-02 30716, 2012

16:51 PM
ocharles

kepstin-work: depending on how you queue is built

2012-11-02 30724, 2012

16:51 PM
luks

what's wrong with closing edits as they expire?

2012-11-02 30728, 2012

16:51 PM
ocharles

you can have a queue that has a concept of unique ids

2012-11-02 30745, 2012

16:51 PM
ocharles

luks: nothing, that's what this modbot does

2012-11-02 30750, 2012

16:51 PM
kepstin-work

luks: implementing that is hard, because edits don't tell you when they expire

2012-11-02 30702, 2012

16:52 PM
luks

yeah, but I think most poeple find it confusing

2012-11-02 30705, 2012

16:52 PM
ocharles

but if you mean the second a user enters a vote, then I wouldn't want that, because that loads the web server

2012-11-02 30705, 2012

16:52 PM
kepstin-work

luks: so you have to check all the edits to see if they have expired yet

2012-11-02 30721, 2012

16:52 PM
luks

kepstin-work: if you have database triggers and a message queue, you can know exactly when they expire

2012-11-02 30723, 2012

16:52 PM
kepstin-work

ocharles: web server could send it to the expired edits queue :)

2012-11-02 30730, 2012

16:52 PM
ocharles

oh, true

2012-11-02 30703, 2012

16:53 PM
ocharles

luks: did you end up using pg_amqp for anything in the end?

2012-11-02 30712, 2012

16:53 PM
ocharles

I haven't used it for anything but hobby stuff myself

2012-11-02 30740, 2012

16:53 PM
luks

nope

2012-11-02 30758, 2012

16:53 PM
kepstin-work

luks: basically, expiring edits immediately means having a process that knows when all the upcoming edit expirations are going to be, and just sleeps until the next one before firing a notification of some sort - it should be doable, i think...

2012-11-02 30728, 2012

16:54 PM
ocharles

expiring edits immediately just needs a queue, a modbot, and something pushing messages when something happens that expires an edit

2012-11-02 30735, 2012

16:54 PM
ocharles

as was said before

2012-11-02 30752, 2012

16:54 PM
djce joined the channel

2012-11-02 30703, 2012

16:55 PM
ocharles

fsvo "immediately", anyway

2012-11-02 30712, 2012

16:55 PM
luks

there could be still some "grace period"

2012-11-02 30721, 2012

16:55 PM
luks

but it would be fixed amount of time

2012-11-02 30732, 2012

16:55 PM
luks

not waiting to the next hour

2012-11-02 30758, 2012

16:55 PM
ocharles

sounds like something to discuss :)

2012-11-02 30706, 2012

16:56 PM
kepstin-work

no gaming the system if it always expires an edit exactly 1h after the 3rd yes vote, eh :)

2012-11-02 30745, 2012

16:59 PM
luks

does anybody know what exactly is the plan with the "ingestr"?

2012-11-02 30743, 2012

17:01 PM
ruaok

luks: there isn't an exact plan yet.

2012-11-02 30752, 2012

17:01 PM
ocharles

as I understand it, to be something that can take arbiratry data and index it, such that people can later view the data and say "this path is the artist name and this path is the release name", so please open that in the release editor

2012-11-02 30715, 2012

17:02 PM
ruaok nods at ocharles

2012-11-02 30718, 2012

17:02 PM
hawke_

Why is it spelled all stupid-like?

2012-11-02 30733, 2012

17:02 PM
ruaok

the first step is to expose data and make it searchable.

2012-11-02 30740, 2012

17:02 PM
ruaok

and create very simple import steps

2012-11-02 30741, 2012

17:02 PM
ocharles

hawke_: a play on flickr

2012-11-02 30751, 2012

17:02 PM
Freso

ocharles: O RLY?

2012-11-02 30758, 2012

17:02 PM
ruaok

then to let the community give us feedback on if its useful and how to make it better.

2012-11-02 30728, 2012

17:03 PM
ruaok

so since we have IA data, the point is to use that to expose it.

2012-11-02 30741, 2012

17:03 PM
ruaok

and then we need to work on matching tools for matching foreign data sets to MB.

2012-11-02 30751, 2012

17:03 PM
ruaok

luks: I saw that you offered to do some matching work for Brewster.

2012-11-02 30755, 2012

17:03 PM
ruaok

do you have a plan for that yet?

2012-11-02 30707, 2012

17:04 PM
kepstin-work

what kinds of input formats would ingestr be looking at supporting? just sutff like xml? pluggable frontends?

2012-11-02 30732, 2012

17:04 PM
luks

ruaok: I'm running the matching script already

2012-11-02 30745, 2012

17:04 PM
ruaok

luks: cool.

2012-11-02 30758, 2012

17:04 PM
ruaok

is the source somewhere where we can play with it and use it for other data sets?

2012-11-02 30707, 2012

17:05 PM
luks

which is related to my question, I have albums which probably match, but I'd like somebody to review it

2012-11-02 30721, 2012

17:05 PM
luks

and I'm not sure if a web app for that conflicts with the ingestr somehow

2012-11-02 30728, 2012

17:05 PM
ruaok

luks: yep, that is the next step. we need to find a way to manage that

2012-11-02 30733, 2012

17:05 PM
ruaok

for more than one data set.

2012-11-02 30738, 2012

17:05 PM
luks

ruaok: https://github.com/lalinsky/musicbrainz-matching-…

2012-11-02 30749, 2012

17:05 PM
luks

but I'm working on it as it runs

2012-11-02 30751, 2012

17:05 PM
ruaok

and I honestly dont have a good way laid out in my mind on how to do that.

2012-11-02 30753, 2012

17:05 PM
ruaok

thanks!

2012-11-02 30747, 2012

17:07 PM
luks

I'm doing fuzzy matching based on track lengths and then validating track titles to make sure I don't have false positive

2012-11-02 30703, 2012

17:08 PM
luks

which works great, but sometimes the titles are just too different to be sure it's a positive match

2012-11-02 30707, 2012

17:08 PM
ruaok

great. I think that approach makes sense.

2012-11-02 30729, 2012

17:08 PM
luks

so I'm thinking of creating a simple app that displays a random album from that category and asks the user to verify it

2012-11-02 30747, 2012

17:08 PM
luks

there isn't that many of such matches there, but it's more than I can handle personally :)

2012-11-02 30750, 2012

17:08 PM
ruaok

I like that.

2012-11-02 30758, 2012

17:08 PM
kepstin-work

this sounds somewhat similar to what the matching code in picard does.

2012-11-02 30720, 2012

17:09 PM
ruaok

I think there is a possibility of lots of such matches. not all data sets incoming will be as clean at what we're getting from the archive.

2012-11-02 30715, 2012

17:10 PM
luks

I've been thinking about indexing discogs as well and if I don't find a match in MB at all, but I do find a match in discogs, offer the user to import it from discogs

2012-11-02 30720, 2012

17:10 PM
reosarevok

luks: wouldn't that auto-match a release to its remaster or something?

2012-11-02 30726, 2012

17:10 PM
luks

*maybe* even import it automatically from discogs

2012-11-02 30731, 2012

17:10 PM
ruaok

luks: yep.

2012-11-02 30748, 2012

17:10 PM
luks

reosarevok: yes, probably

2012-11-02 30753, 2012

17:10 PM
ruaok

luks: I think if we make the matching *REALLY* conservative, and people review the results then maybe we can do just that.

2012-11-02 30704, 2012

17:11 PM
luks

reosarevok: but if I want to deal with that, I can forget about this kind of matching

2012-11-02 30736, 2012

17:11 PM
reosarevok

I imagine that's not in the interests of the IA - I imagine they'd want a copy of both original and remaster and to know which one is which

2012-11-02 30751, 2012

17:11 PM
reosarevok

But I don't know if there's anything in their data that can allow us to know

2012-11-02 30753, 2012

17:11 PM
ruaok

luks: there are lots of people literally throwing data sets at us.

2012-11-02 30707, 2012

17:12 PM
reosarevok

(say barcodes or catnos or something)

2012-11-02 30710, 2012

17:12 PM
ruaok

and we really need a comprehensive solution for dealing with them. exposing the data, importing clean data and all that.

2012-11-02 30726, 2012

17:12 PM
ruaok

oh, and I think we can also harvest data from these data sets.

2012-11-02 30726, 2012

17:12 PM
luks

reosarevok: people generally do not keep that kind of information in tags

2012-11-02 30734, 2012

17:12 PM
ruaok

such as picking barcodes from these data sets.

2012-11-02 30757, 2012

17:12 PM
ruaok

luks: maybe we should store matched data in ingrestr too. (this record matches to MBID blah)

2012-11-02 30707, 2012

17:13 PM
ruaok

then we can pluck extra data out automatically via a bot.

2012-11-02 30735, 2012

17:13 PM
reosarevok

luks: obviously for the ones they digitise themselves, that's basic stuff to include

2012-11-02 30751, 2012

17:13 PM
reosarevok

So I trust the IA will include it. But for the rest, yes, dunno

2012-11-02 30729, 2012

17:14 PM
luks

the only way to get that kind of information, if it's there, is parsing the textual descriptions

2012-11-02 30745, 2012

17:14 PM
ruaok

and I would love ingrestr to be in python. :(

2012-11-02 30754, 2012

17:14 PM
luks

IA's indexing is pretty primitive regarding metadata

2012-11-02 30755, 2012

17:14 PM
ruaok

maybe we should start over with a more clear goal in mind. :)

2012-11-02 30753, 2012

17:15 PM
reosarevok

ruaok: the goal seems clear? "manage different sets of metadata, automatically find matches between them and with MusicBrainz, and provide users an easy way to confirm them or import the ones that do not match"

2012-11-02 30759, 2012

17:15 PM
reosarevok

What seems hard is the execution :p

2012-11-02 30710, 2012

17:16 PM
luks

that's too broad goal, IMO

2012-11-02 30729, 2012

17:16 PM
ruaok

for ingestr, yes.

2012-11-02 30733, 2012

17:16 PM
luks

each data set will be different

2012-11-02 30739, 2012

17:16 PM
ruaok

there needs to be a web app component too.

2012-11-02 30740, 2012

17:16 PM
reosarevok

luks: that's the point

2012-11-02 30751, 2012

17:16 PM
ruaok

luks: our plan is to keep mappings between different data sets in ingrestr.

2012-11-02 30754, 2012

17:16 PM
reosarevok

Isn't the idea to turn each of the data sets into something compatible with all the rest?

2012-11-02 30705, 2012

17:17 PM
ruaok

reosarevok: that is my idea, yes.

2012-11-02 30718, 2012

17:17 PM
ruaok

I see this working as a two step process.

2012-11-02 30729, 2012

17:17 PM
ruaok

1. ingest data and show unstructured results.

2012-11-02 30743, 2012

17:17 PM
ruaok

2. let the community look at it and figure out a mapping for the new data.

2012-11-02 30747, 2012

17:17 PM
ruaok

3. install the mapping

2012-11-02 30750, 2012

17:17 PM
ruaok

4. import data.

2012-11-02 30757, 2012

17:17 PM
luks

making different data sets "compatible" is not going to work

2012-11-02 30758, 2012

17:17 PM
ruaok

(and you get two extra steps for free!)

2012-11-02 30700, 2012

17:18 PM
Freso

- he says and lists 4 steps.

2012-11-02 30704, 2012

17:18 PM
Freso

:p

2012-11-02 30712, 2012

17:18 PM
reosarevok

luks: how is it not going to work?

2012-11-02 30713, 2012

17:18 PM
luks

that's huge amount of work, in cases it's even possible

2012-11-02 30726, 2012

17:18 PM
ruaok

luks: I dont think that is the goal.

2012-11-02 30727, 2012

17:18 PM
reosarevok

I mean, basically it's mapping each set to MB's approach

2012-11-02 30737, 2012

17:18 PM
reosarevok

(which makes them compatible-through-MB)

2012-11-02 30738, 2012

17:18 PM
luks

in different data sets you have different information and you can use that information primarily for matching to MB

2012-11-02 30738, 2012

17:18 PM
ruaok

I view it as picking matching data components out.

2012-11-02 30757, 2012

17:18 PM
luks

sometimes you don't have that, but you have something else, that the other data set doesn't

2012-11-02 30700, 2012

17:19 PM
reosarevok

But when you know what matches where, you can also look for places where the datasets match in their MB-matching

2012-11-02 30702, 2012

17:19 PM
ruaok

and making it easier to import, but in a lot of cases direct import wont work well.

2012-11-02 30714, 2012

17:19 PM
reosarevok

And be like "huh, these might be the same thing!"

2012-11-02 30739, 2012

17:21 PM
luks

reosarevok: what's the point of keeping the data after you match them to MB?

2012-11-02 30703, 2012

17:22 PM
ocharles

luks: to send that data back to the source

2012-11-02 30707, 2012

17:22 PM
reosarevok

luks: for the ones which match to MB, dunno

2012-11-02 30710, 2012

17:22 PM
ocharles

was one argument, anyway

2012-11-02 30711, 2012

17:22 PM
reosarevok

I care about the ones which do *not*

2012-11-02 30747, 2012

17:22 PM
reosarevok

(say, we get info from two different sources but the release has the same cat# or EAN or track times)

2012-11-02 30759, 2012

17:22 PM
ruaok

it might also be useful for saying: this data is BAD. don't want.

2012-11-02 30707, 2012

17:23 PM
ruaok

prevent repeat import attempts

2012-11-02 30710, 2012

17:23 PM
reosarevok

Precisely because, as you said, different sets bring different info, it's useful to try to put it all together

2012-11-02 30759, 2012

17:23 PM
reosarevok

(same as I, manually, might search site X's data to find a barcode and then use that barcode to find a release on amazon to find a back cover to find more data - just bottily :p)

2012-11-02 30721, 2012

17:24 PM
luks

I don't think it will every cherry-pick data like that

2012-11-02 30726, 2012

17:24 PM
luks

*ever

2012-11-02 30739, 2012

17:24 PM
reosarevok

Well, that's the obvious use of having multiple datasets coming in

2012-11-02 30709, 2012

17:25 PM
reosarevok

So it would be a bit sad not to try to take advantage of it

2012-11-02 30711, 2012

17:25 PM
luks

realistically, you will be happy if you get album/title/artist

2012-11-02 30721, 2012

17:25 PM
luks

which is not even good for automatic import

2012-11-02 30725, 2012

17:25 PM
reosarevok

heh

2012-11-02 30734, 2012

17:25 PM
reosarevok

I guess we're used to different data sources :p

2012-11-02 30722, 2012

17:26 PM
reosarevok has been playing with the 70k albums in the naxos music library, and those are both fairly complete and fairly easy to match to other datasets for extra info

2012-11-02 30742, 2012

17:26 PM
reosarevok

Of course, someone would have to ask them for the data, but I imagine that might actually work

2012-11-02 30701, 2012

17:27 PM
reosarevok

I haven't seen label data for pop so I don't know how much that sucks