the systemd timer stuff is annoying for things like logrotate on laptops that don't have long uptimes - if you have the timer set to every 12h starting from boot time and never leave it running for 12h, the timer will never trigger.
so cron is still useful.
warp
winter :(
it feels as if it's 20:00 but it isn't even 18:00 yet.
kepstin-work
apparently the DST change is this weekend here.
warp nods.
ocharles
kepstin-work: I suppose another option is to be super cloudy, and use one process that is fired on the hour to find open edits and send messages to a message queue
and another process that waits on the message queue
kepstin-work
dunno if that'll make it better or worse :)
ocharles
then you can have modbot farms if we suddenly need to process millions of edits an hour
:P
warp
also I feel old and grumpy just for noticing + complaining about it :)
ocharles
warp: it's 4:50pm here and dark outside :(
kepstin-work
ocharles: heh, but that would act strangely if the queue gets backed up - edits will be in the queue multiple times.
i suppose just making the queue drop messages that can't be processed would be ok, tho.
ocharles
kepstin-work: depending on how you queue is built
luks
what's wrong with closing edits as they expire?
ocharles
you can have a queue that has a concept of unique ids
luks: nothing, that's what this modbot does
kepstin-work
luks: implementing that is hard, because edits don't tell you when they expire
luks
yeah, but I think most poeple find it confusing
ocharles
but if you mean the second a user enters a vote, then I wouldn't want that, because that loads the web server
kepstin-work
luks: so you have to check all the edits to see if they have expired yet
luks
kepstin-work: if you have database triggers and a message queue, you can know exactly when they expire
kepstin-work
ocharles: web server could send it to the expired edits queue :)
ocharles
oh, true
luks: did you end up using pg_amqp for anything in the end?
I haven't used it for anything but hobby stuff myself
luks
nope
kepstin-work
luks: basically, expiring edits immediately means having a process that knows when all the upcoming edit expirations are going to be, and just sleeps until the next one before firing a notification of some sort - it should be doable, i think...
ocharles
expiring edits immediately just needs a queue, a modbot, and something pushing messages when something happens that expires an edit
as was said before
djce joined the channel
fsvo "immediately", anyway
luks
there could be still some "grace period"
but it would be fixed amount of time
not waiting to the next hour
ocharles
sounds like something to discuss :)
kepstin-work
no gaming the system if it always expires an edit exactly 1h after the 3rd yes vote, eh :)
luks
does anybody know what exactly is the plan with the "ingestr"?
ruaok
luks: there isn't an exact plan yet.
ocharles
as I understand it, to be something that can take arbiratry data and index it, such that people can later view the data and say "this path is the artist name and this path is the release name", so please open that in the release editor
ruaok nods at ocharles
hawke_
Why is it spelled all stupid-like?
ruaok
the first step is to expose data and make it searchable.
and create very simple import steps
ocharles
hawke_: a play on flickr
Freso
ocharles: O RLY?
ruaok
then to let the community give us feedback on if its useful and how to make it better.
so since we have IA data, the point is to use that to expose it.
and then we need to work on matching tools for matching foreign data sets to MB.
luks: I saw that you offered to do some matching work for Brewster.
do you have a plan for that yet?
kepstin-work
what kinds of input formats would ingestr be looking at supporting? just sutff like xml? pluggable frontends?
luks
ruaok: I'm running the matching script already
ruaok
luks: cool.
is the source somewhere where we can play with it and use it for other data sets?
luks
which is related to my question, I have albums which probably match, but I'd like somebody to review it
and I'm not sure if a web app for that conflicts with the ingestr somehow
ruaok
luks: yep, that is the next step. we need to find a way to manage that
and I honestly dont have a good way laid out in my mind on how to do that.
thanks!
luks
I'm doing fuzzy matching based on track lengths and then validating track titles to make sure I don't have false positive
which works great, but sometimes the titles are just too different to be sure it's a positive match
ruaok
great. I think that approach makes sense.
luks
so I'm thinking of creating a simple app that displays a random album from that category and asks the user to verify it
there isn't that many of such matches there, but it's more than I can handle personally :)
ruaok
I like that.
kepstin-work
this sounds somewhat similar to what the matching code in picard does.
ruaok
I think there is a possibility of lots of such matches. not all data sets incoming will be as clean at what we're getting from the archive.
luks
I've been thinking about indexing discogs as well and if I don't find a match in MB at all, but I do find a match in discogs, offer the user to import it from discogs
reosarevok
luks: wouldn't that auto-match a release to its remaster or something?
luks
*maybe* even import it automatically from discogs
ruaok
luks: yep.
luks
reosarevok: yes, probably
ruaok
luks: I think if we make the matching *REALLY* conservative, and people review the results then maybe we can do just that.
luks
reosarevok: but if I want to deal with that, I can forget about this kind of matching
reosarevok
I imagine that's not in the interests of the IA - I imagine they'd want a copy of both original and remaster and to know which one is which
But I don't know if there's anything in their data that can allow us to know
ruaok
luks: there are lots of people literally throwing data sets at us.
reosarevok
(say barcodes or catnos or something)
ruaok
and we really need a comprehensive solution for dealing with them. exposing the data, importing clean data and all that.
oh, and I think we can also harvest data from these data sets.
luks
reosarevok: people generally do not keep that kind of information in tags
ruaok
such as picking barcodes from these data sets.
luks: maybe we should store matched data in ingrestr too. (this record matches to MBID blah)
then we can pluck extra data out automatically via a bot.
reosarevok
luks: obviously for the ones they digitise themselves, that's basic stuff to include
So I trust the IA will include it. But for the rest, yes, dunno
luks
the only way to get that kind of information, if it's there, is parsing the textual descriptions
ruaok
and I would love ingrestr to be in python. :(
luks
IA's indexing is pretty primitive regarding metadata
ruaok
maybe we should start over with a more clear goal in mind. :)
reosarevok
ruaok: the goal seems clear? "manage different sets of metadata, automatically find matches between them and with MusicBrainz, and provide users an easy way to confirm them or import the ones that do not match"
What seems hard is the execution :p
luks
that's too broad goal, IMO
ruaok
for ingestr, yes.
luks
each data set will be different
ruaok
there needs to be a web app component too.
reosarevok
luks: that's the point
ruaok
luks: our plan is to keep mappings between different data sets in ingrestr.
reosarevok
Isn't the idea to turn each of the data sets into something compatible with all the rest?
ruaok
reosarevok: that is my idea, yes.
I see this working as a two step process.
1. ingest data and show unstructured results.
2. let the community look at it and figure out a mapping for the new data.
3. install the mapping
4. import data.
luks
making different data sets "compatible" is not going to work
ruaok
(and you get two extra steps for free!)
Freso
- he says and lists 4 steps.
:p
reosarevok
luks: how is it not going to work?
luks
that's huge amount of work, in cases it's even possible
ruaok
luks: I dont think that is the goal.
reosarevok
I mean, basically it's mapping each set to MB's approach
(which makes them compatible-through-MB)
luks
in different data sets you have different information and you can use that information primarily for matching to MB
ruaok
I view it as picking matching data components out.
luks
sometimes you don't have that, but you have something else, that the other data set doesn't
reosarevok
But when you know what matches where, you can also look for places where the datasets match in their MB-matching
ruaok
and making it easier to import, but in a lot of cases direct import wont work well.
reosarevok
And be like "huh, these might be the same thing!"
luks
reosarevok: what's the point of keeping the data after you match them to MB?
ocharles
luks: to send that data back to the source
reosarevok
luks: for the ones which match to MB, dunno
ocharles
was one argument, anyway
reosarevok
I care about the ones which do *not*
(say, we get info from two different sources but the release has the same cat# or EAN or track times)
ruaok
it might also be useful for saying: this data is BAD. don't want.
prevent repeat import attempts
reosarevok
Precisely because, as you said, different sets bring different info, it's useful to try to put it all together
(same as I, manually, might search site X's data to find a barcode and then use that barcode to find a release on amazon to find a back cover to find more data - just bottily :p)
luks
I don't think it will every cherry-pick data like that
*ever
reosarevok
Well, that's the obvious use of having multiple datasets coming in
So it would be a bit sad not to try to take advantage of it
luks
realistically, you will be happy if you get album/title/artist
which is not even good for automatic import
reosarevok
heh
I guess we're used to different data sources :p
reosarevok has been playing with the 70k albums in the naxos music library, and those are both fairly complete and fairly easy to match to other datasets for extra info
Of course, someone would have to ask them for the data, but I imagine that might actually work
I haven't seen label data for pop so I don't know how much that sucks