the systemd timer stuff is annoying for things like logrotate on laptops that don't have long uptimes - if you have the timer set to every 12h starting from boot time and never leave it running for 12h, the timer will never trigger.
2012-11-02 30750, 2012
kepstin-work
so cron is still useful.
2012-11-02 30754, 2012
warp
winter :(
2012-11-02 30714, 2012
warp
it feels as if it's 20:00 but it isn't even 18:00 yet.
2012-11-02 30735, 2012
kepstin-work
apparently the DST change is this weekend here.
2012-11-02 30748, 2012
warp nods.
2012-11-02 30702, 2012
ocharles
kepstin-work: I suppose another option is to be super cloudy, and use one process that is fired on the hour to find open edits and send messages to a message queue
2012-11-02 30706, 2012
ocharles
and another process that waits on the message queue
2012-11-02 30709, 2012
kepstin-work
dunno if that'll make it better or worse :)
2012-11-02 30718, 2012
ocharles
then you can have modbot farms if we suddenly need to process millions of edits an hour
2012-11-02 30719, 2012
ocharles
:P
2012-11-02 30719, 2012
warp
also I feel old and grumpy just for noticing + complaining about it :)
2012-11-02 30733, 2012
ocharles
warp: it's 4:50pm here and dark outside :(
2012-11-02 30754, 2012
kepstin-work
ocharles: heh, but that would act strangely if the queue gets backed up - edits will be in the queue multiple times.
2012-11-02 30712, 2012
kepstin-work
i suppose just making the queue drop messages that can't be processed would be ok, tho.
2012-11-02 30716, 2012
ocharles
kepstin-work: depending on how you queue is built
2012-11-02 30724, 2012
luks
what's wrong with closing edits as they expire?
2012-11-02 30728, 2012
ocharles
you can have a queue that has a concept of unique ids
2012-11-02 30745, 2012
ocharles
luks: nothing, that's what this modbot does
2012-11-02 30750, 2012
kepstin-work
luks: implementing that is hard, because edits don't tell you when they expire
2012-11-02 30702, 2012
luks
yeah, but I think most poeple find it confusing
2012-11-02 30705, 2012
ocharles
but if you mean the second a user enters a vote, then I wouldn't want that, because that loads the web server
2012-11-02 30705, 2012
kepstin-work
luks: so you have to check all the edits to see if they have expired yet
2012-11-02 30721, 2012
luks
kepstin-work: if you have database triggers and a message queue, you can know exactly when they expire
2012-11-02 30723, 2012
kepstin-work
ocharles: web server could send it to the expired edits queue :)
2012-11-02 30730, 2012
ocharles
oh, true
2012-11-02 30703, 2012
ocharles
luks: did you end up using pg_amqp for anything in the end?
2012-11-02 30712, 2012
ocharles
I haven't used it for anything but hobby stuff myself
2012-11-02 30740, 2012
luks
nope
2012-11-02 30758, 2012
kepstin-work
luks: basically, expiring edits immediately means having a process that knows when all the upcoming edit expirations are going to be, and just sleeps until the next one before firing a notification of some sort - it should be doable, i think...
2012-11-02 30728, 2012
ocharles
expiring edits immediately just needs a queue, a modbot, and something pushing messages when something happens that expires an edit
2012-11-02 30735, 2012
ocharles
as was said before
2012-11-02 30752, 2012
djce joined the channel
2012-11-02 30703, 2012
ocharles
fsvo "immediately", anyway
2012-11-02 30712, 2012
luks
there could be still some "grace period"
2012-11-02 30721, 2012
luks
but it would be fixed amount of time
2012-11-02 30732, 2012
luks
not waiting to the next hour
2012-11-02 30758, 2012
ocharles
sounds like something to discuss :)
2012-11-02 30706, 2012
kepstin-work
no gaming the system if it always expires an edit exactly 1h after the 3rd yes vote, eh :)
2012-11-02 30745, 2012
luks
does anybody know what exactly is the plan with the "ingestr"?
2012-11-02 30743, 2012
ruaok
luks: there isn't an exact plan yet.
2012-11-02 30752, 2012
ocharles
as I understand it, to be something that can take arbiratry data and index it, such that people can later view the data and say "this path is the artist name and this path is the release name", so please open that in the release editor
2012-11-02 30715, 2012
ruaok nods at ocharles
2012-11-02 30718, 2012
hawke_
Why is it spelled all stupid-like?
2012-11-02 30733, 2012
ruaok
the first step is to expose data and make it searchable.
2012-11-02 30740, 2012
ruaok
and create very simple import steps
2012-11-02 30741, 2012
ocharles
hawke_: a play on flickr
2012-11-02 30751, 2012
Freso
ocharles: O RLY?
2012-11-02 30758, 2012
ruaok
then to let the community give us feedback on if its useful and how to make it better.
2012-11-02 30728, 2012
ruaok
so since we have IA data, the point is to use that to expose it.
2012-11-02 30741, 2012
ruaok
and then we need to work on matching tools for matching foreign data sets to MB.
2012-11-02 30751, 2012
ruaok
luks: I saw that you offered to do some matching work for Brewster.
2012-11-02 30755, 2012
ruaok
do you have a plan for that yet?
2012-11-02 30707, 2012
kepstin-work
what kinds of input formats would ingestr be looking at supporting? just sutff like xml? pluggable frontends?
2012-11-02 30732, 2012
luks
ruaok: I'm running the matching script already
2012-11-02 30745, 2012
ruaok
luks: cool.
2012-11-02 30758, 2012
ruaok
is the source somewhere where we can play with it and use it for other data sets?
2012-11-02 30707, 2012
luks
which is related to my question, I have albums which probably match, but I'd like somebody to review it
2012-11-02 30721, 2012
luks
and I'm not sure if a web app for that conflicts with the ingestr somehow
2012-11-02 30728, 2012
ruaok
luks: yep, that is the next step. we need to find a way to manage that
and I honestly dont have a good way laid out in my mind on how to do that.
2012-11-02 30753, 2012
ruaok
thanks!
2012-11-02 30747, 2012
luks
I'm doing fuzzy matching based on track lengths and then validating track titles to make sure I don't have false positive
2012-11-02 30703, 2012
luks
which works great, but sometimes the titles are just too different to be sure it's a positive match
2012-11-02 30707, 2012
ruaok
great. I think that approach makes sense.
2012-11-02 30729, 2012
luks
so I'm thinking of creating a simple app that displays a random album from that category and asks the user to verify it
2012-11-02 30747, 2012
luks
there isn't that many of such matches there, but it's more than I can handle personally :)
2012-11-02 30750, 2012
ruaok
I like that.
2012-11-02 30758, 2012
kepstin-work
this sounds somewhat similar to what the matching code in picard does.
2012-11-02 30720, 2012
ruaok
I think there is a possibility of lots of such matches. not all data sets incoming will be as clean at what we're getting from the archive.
2012-11-02 30715, 2012
luks
I've been thinking about indexing discogs as well and if I don't find a match in MB at all, but I do find a match in discogs, offer the user to import it from discogs
2012-11-02 30720, 2012
reosarevok
luks: wouldn't that auto-match a release to its remaster or something?
2012-11-02 30726, 2012
luks
*maybe* even import it automatically from discogs
2012-11-02 30731, 2012
ruaok
luks: yep.
2012-11-02 30748, 2012
luks
reosarevok: yes, probably
2012-11-02 30753, 2012
ruaok
luks: I think if we make the matching *REALLY* conservative, and people review the results then maybe we can do just that.
2012-11-02 30704, 2012
luks
reosarevok: but if I want to deal with that, I can forget about this kind of matching
2012-11-02 30736, 2012
reosarevok
I imagine that's not in the interests of the IA - I imagine they'd want a copy of both original and remaster and to know which one is which
2012-11-02 30751, 2012
reosarevok
But I don't know if there's anything in their data that can allow us to know
2012-11-02 30753, 2012
ruaok
luks: there are lots of people literally throwing data sets at us.
2012-11-02 30707, 2012
reosarevok
(say barcodes or catnos or something)
2012-11-02 30710, 2012
ruaok
and we really need a comprehensive solution for dealing with them. exposing the data, importing clean data and all that.
2012-11-02 30726, 2012
ruaok
oh, and I think we can also harvest data from these data sets.
2012-11-02 30726, 2012
luks
reosarevok: people generally do not keep that kind of information in tags
2012-11-02 30734, 2012
ruaok
such as picking barcodes from these data sets.
2012-11-02 30757, 2012
ruaok
luks: maybe we should store matched data in ingrestr too. (this record matches to MBID blah)
2012-11-02 30707, 2012
ruaok
then we can pluck extra data out automatically via a bot.
2012-11-02 30735, 2012
reosarevok
luks: obviously for the ones they digitise themselves, that's basic stuff to include
2012-11-02 30751, 2012
reosarevok
So I trust the IA will include it. But for the rest, yes, dunno
2012-11-02 30729, 2012
luks
the only way to get that kind of information, if it's there, is parsing the textual descriptions
2012-11-02 30745, 2012
ruaok
and I would love ingrestr to be in python. :(
2012-11-02 30754, 2012
luks
IA's indexing is pretty primitive regarding metadata
2012-11-02 30755, 2012
ruaok
maybe we should start over with a more clear goal in mind. :)
2012-11-02 30753, 2012
reosarevok
ruaok: the goal seems clear? "manage different sets of metadata, automatically find matches between them and with MusicBrainz, and provide users an easy way to confirm them or import the ones that do not match"
2012-11-02 30759, 2012
reosarevok
What seems hard is the execution :p
2012-11-02 30710, 2012
luks
that's too broad goal, IMO
2012-11-02 30729, 2012
ruaok
for ingestr, yes.
2012-11-02 30733, 2012
luks
each data set will be different
2012-11-02 30739, 2012
ruaok
there needs to be a web app component too.
2012-11-02 30740, 2012
reosarevok
luks: that's the point
2012-11-02 30751, 2012
ruaok
luks: our plan is to keep mappings between different data sets in ingrestr.
2012-11-02 30754, 2012
reosarevok
Isn't the idea to turn each of the data sets into something compatible with all the rest?
2012-11-02 30705, 2012
ruaok
reosarevok: that is my idea, yes.
2012-11-02 30718, 2012
ruaok
I see this working as a two step process.
2012-11-02 30729, 2012
ruaok
1. ingest data and show unstructured results.
2012-11-02 30743, 2012
ruaok
2. let the community look at it and figure out a mapping for the new data.
2012-11-02 30747, 2012
ruaok
3. install the mapping
2012-11-02 30750, 2012
ruaok
4. import data.
2012-11-02 30757, 2012
luks
making different data sets "compatible" is not going to work
2012-11-02 30758, 2012
ruaok
(and you get two extra steps for free!)
2012-11-02 30700, 2012
Freso
- he says and lists 4 steps.
2012-11-02 30704, 2012
Freso
:p
2012-11-02 30712, 2012
reosarevok
luks: how is it not going to work?
2012-11-02 30713, 2012
luks
that's huge amount of work, in cases it's even possible
2012-11-02 30726, 2012
ruaok
luks: I dont think that is the goal.
2012-11-02 30727, 2012
reosarevok
I mean, basically it's mapping each set to MB's approach
2012-11-02 30737, 2012
reosarevok
(which makes them compatible-through-MB)
2012-11-02 30738, 2012
luks
in different data sets you have different information and you can use that information primarily for matching to MB
2012-11-02 30738, 2012
ruaok
I view it as picking matching data components out.
2012-11-02 30757, 2012
luks
sometimes you don't have that, but you have something else, that the other data set doesn't
2012-11-02 30700, 2012
reosarevok
But when you know what matches where, you can also look for places where the datasets match in their MB-matching
2012-11-02 30702, 2012
ruaok
and making it easier to import, but in a lot of cases direct import wont work well.
2012-11-02 30714, 2012
reosarevok
And be like "huh, these might be the same thing!"
2012-11-02 30739, 2012
luks
reosarevok: what's the point of keeping the data after you match them to MB?
2012-11-02 30703, 2012
ocharles
luks: to send that data back to the source
2012-11-02 30707, 2012
reosarevok
luks: for the ones which match to MB, dunno
2012-11-02 30710, 2012
ocharles
was one argument, anyway
2012-11-02 30711, 2012
reosarevok
I care about the ones which do *not*
2012-11-02 30747, 2012
reosarevok
(say, we get info from two different sources but the release has the same cat# or EAN or track times)
2012-11-02 30759, 2012
ruaok
it might also be useful for saying: this data is BAD. don't want.
2012-11-02 30707, 2012
ruaok
prevent repeat import attempts
2012-11-02 30710, 2012
reosarevok
Precisely because, as you said, different sets bring different info, it's useful to try to put it all together
2012-11-02 30759, 2012
reosarevok
(same as I, manually, might search site X's data to find a barcode and then use that barcode to find a release on amazon to find a back cover to find more data - just bottily :p)
2012-11-02 30721, 2012
luks
I don't think it will every cherry-pick data like that
2012-11-02 30726, 2012
luks
*ever
2012-11-02 30739, 2012
reosarevok
Well, that's the obvious use of having multiple datasets coming in
2012-11-02 30709, 2012
reosarevok
So it would be a bit sad not to try to take advantage of it
2012-11-02 30711, 2012
luks
realistically, you will be happy if you get album/title/artist
2012-11-02 30721, 2012
luks
which is not even good for automatic import
2012-11-02 30725, 2012
reosarevok
heh
2012-11-02 30734, 2012
reosarevok
I guess we're used to different data sources :p
2012-11-02 30722, 2012
reosarevok has been playing with the 70k albums in the naxos music library, and those are both fairly complete and fairly easy to match to other datasets for extra info
2012-11-02 30742, 2012
reosarevok
Of course, someone would have to ask them for the data, but I imagine that might actually work
2012-11-02 30701, 2012
reosarevok
I haven't seen label data for pop so I don't know how much that sucks