#musicbrainz-devel

/

      • ocharles
        damnit, why has no one done this work for me!
      • warp: go ahead
      • warp
        ocharles: great :D
      • kepstin-work
        the systemd timer stuff is annoying for things like logrotate on laptops that don't have long uptimes - if you have the timer set to every 12h starting from boot time and never leave it running for 12h, the timer will never trigger.
      • so cron is still useful.
      • warp
        winter :(
      • it feels as if it's 20:00 but it isn't even 18:00 yet.
      • kepstin-work
        apparently the DST change is this weekend here.
      • warp nods.
      • ocharles
        kepstin-work: I suppose another option is to be super cloudy, and use one process that is fired on the hour to find open edits and send messages to a message queue
      • and another process that waits on the message queue
      • kepstin-work
        dunno if that'll make it better or worse :)
      • ocharles
        then you can have modbot farms if we suddenly need to process millions of edits an hour
      • :P
      • warp
        also I feel old and grumpy just for noticing + complaining about it :)
      • ocharles
        warp: it's 4:50pm here and dark outside :(
      • kepstin-work
        ocharles: heh, but that would act strangely if the queue gets backed up - edits will be in the queue multiple times.
      • i suppose just making the queue drop messages that can't be processed would be ok, tho.
      • ocharles
        kepstin-work: depending on how you queue is built
      • luks
        what's wrong with closing edits as they expire?
      • ocharles
        you can have a queue that has a concept of unique ids
      • luks: nothing, that's what this modbot does
      • kepstin-work
        luks: implementing that is hard, because edits don't tell you when they expire
      • luks
        yeah, but I think most poeple find it confusing
      • ocharles
        but if you mean the second a user enters a vote, then I wouldn't want that, because that loads the web server
      • kepstin-work
        luks: so you have to check all the edits to see if they have expired yet
      • luks
        kepstin-work: if you have database triggers and a message queue, you can know exactly when they expire
      • kepstin-work
        ocharles: web server could send it to the expired edits queue :)
      • ocharles
        oh, true
      • luks: did you end up using pg_amqp for anything in the end?
      • I haven't used it for anything but hobby stuff myself
      • luks
        nope
      • kepstin-work
        luks: basically, expiring edits immediately means having a process that knows when all the upcoming edit expirations are going to be, and just sleeps until the next one before firing a notification of some sort - it should be doable, i think...
      • ocharles
        expiring edits immediately just needs a queue, a modbot, and something pushing messages when something happens that expires an edit
      • as was said before
      • djce joined the channel
      • fsvo "immediately", anyway
      • luks
        there could be still some "grace period"
      • but it would be fixed amount of time
      • not waiting to the next hour
      • ocharles
        sounds like something to discuss :)
      • kepstin-work
        no gaming the system if it always expires an edit exactly 1h after the 3rd yes vote, eh :)
      • luks
        does anybody know what exactly is the plan with the "ingestr"?
      • ruaok
        luks: there isn't an exact plan yet.
      • ocharles
        as I understand it, to be something that can take arbiratry data and index it, such that people can later view the data and say "this path is the artist name and this path is the release name", so please open that in the release editor
      • ruaok nods at ocharles
      • hawke_
        Why is it spelled all stupid-like?
      • ruaok
        the first step is to expose data and make it searchable.
      • and create very simple import steps
      • ocharles
        hawke_: a play on flickr
      • Freso
        ocharles: O RLY?
      • ruaok
        then to let the community give us feedback on if its useful and how to make it better.
      • so since we have IA data, the point is to use that to expose it.
      • and then we need to work on matching tools for matching foreign data sets to MB.
      • luks: I saw that you offered to do some matching work for Brewster.
      • do you have a plan for that yet?
      • kepstin-work
        what kinds of input formats would ingestr be looking at supporting? just sutff like xml? pluggable frontends?
      • luks
        ruaok: I'm running the matching script already
      • ruaok
        luks: cool.
      • is the source somewhere where we can play with it and use it for other data sets?
      • luks
        which is related to my question, I have albums which probably match, but I'd like somebody to review it
      • and I'm not sure if a web app for that conflicts with the ingestr somehow
      • ruaok
        luks: yep, that is the next step. we need to find a way to manage that
      • for more than one data set.
      • luks
      • but I'm working on it as it runs
      • ruaok
        and I honestly dont have a good way laid out in my mind on how to do that.
      • thanks!
      • luks
        I'm doing fuzzy matching based on track lengths and then validating track titles to make sure I don't have false positive
      • which works great, but sometimes the titles are just too different to be sure it's a positive match
      • ruaok
        great. I think that approach makes sense.
      • luks
        so I'm thinking of creating a simple app that displays a random album from that category and asks the user to verify it
      • there isn't that many of such matches there, but it's more than I can handle personally :)
      • ruaok
        I like that.
      • kepstin-work
        this sounds somewhat similar to what the matching code in picard does.
      • ruaok
        I think there is a possibility of lots of such matches. not all data sets incoming will be as clean at what we're getting from the archive.
      • luks
        I've been thinking about indexing discogs as well and if I don't find a match in MB at all, but I do find a match in discogs, offer the user to import it from discogs
      • reosarevok
        luks: wouldn't that auto-match a release to its remaster or something?
      • luks
        *maybe* even import it automatically from discogs
      • ruaok
        luks: yep.
      • luks
        reosarevok: yes, probably
      • ruaok
        luks: I think if we make the matching *REALLY* conservative, and people review the results then maybe we can do just that.
      • luks
        reosarevok: but if I want to deal with that, I can forget about this kind of matching
      • reosarevok
        I imagine that's not in the interests of the IA - I imagine they'd want a copy of both original and remaster and to know which one is which
      • But I don't know if there's anything in their data that can allow us to know
      • ruaok
        luks: there are lots of people literally throwing data sets at us.
      • reosarevok
        (say barcodes or catnos or something)
      • ruaok
        and we really need a comprehensive solution for dealing with them. exposing the data, importing clean data and all that.
      • oh, and I think we can also harvest data from these data sets.
      • luks
        reosarevok: people generally do not keep that kind of information in tags
      • ruaok
        such as picking barcodes from these data sets.
      • luks: maybe we should store matched data in ingrestr too. (this record matches to MBID blah)
      • then we can pluck extra data out automatically via a bot.
      • reosarevok
        luks: obviously for the ones they digitise themselves, that's basic stuff to include
      • So I trust the IA will include it. But for the rest, yes, dunno
      • luks
        the only way to get that kind of information, if it's there, is parsing the textual descriptions
      • ruaok
        and I would love ingrestr to be in python. :(
      • luks
        IA's indexing is pretty primitive regarding metadata
      • ruaok
        maybe we should start over with a more clear goal in mind. :)
      • reosarevok
        ruaok: the goal seems clear? "manage different sets of metadata, automatically find matches between them and with MusicBrainz, and provide users an easy way to confirm them or import the ones that do not match"
      • What seems hard is the execution :p
      • luks
        that's too broad goal, IMO
      • ruaok
        for ingestr, yes.
      • luks
        each data set will be different
      • ruaok
        there needs to be a web app component too.
      • reosarevok
        luks: that's the point
      • ruaok
        luks: our plan is to keep mappings between different data sets in ingrestr.
      • reosarevok
        Isn't the idea to turn each of the data sets into something compatible with all the rest?
      • ruaok
        reosarevok: that is my idea, yes.
      • I see this working as a two step process.
      • 1. ingest data and show unstructured results.
      • 2. let the community look at it and figure out a mapping for the new data.
      • 3. install the mapping
      • 4. import data.
      • luks
        making different data sets "compatible" is not going to work
      • ruaok
        (and you get two extra steps for free!)
      • Freso
        - he says and lists 4 steps.
      • :p
      • reosarevok
        luks: how is it not going to work?
      • luks
        that's huge amount of work, in cases it's even possible
      • ruaok
        luks: I dont think that is the goal.
      • reosarevok
        I mean, basically it's mapping each set to MB's approach
      • (which makes them compatible-through-MB)
      • luks
        in different data sets you have different information and you can use that information primarily for matching to MB
      • ruaok
        I view it as picking matching data components out.
      • luks
        sometimes you don't have that, but you have something else, that the other data set doesn't
      • reosarevok
        But when you know what matches where, you can also look for places where the datasets match in their MB-matching
      • ruaok
        and making it easier to import, but in a lot of cases direct import wont work well.
      • reosarevok
        And be like "huh, these might be the same thing!"
      • luks
        reosarevok: what's the point of keeping the data after you match them to MB?
      • ocharles
        luks: to send that data back to the source
      • reosarevok
        luks: for the ones which match to MB, dunno
      • ocharles
        was one argument, anyway
      • reosarevok
        I care about the ones which do *not*
      • (say, we get info from two different sources but the release has the same cat# or EAN or track times)
      • ruaok
        it might also be useful for saying: this data is BAD. don't want.
      • prevent repeat import attempts
      • reosarevok
        Precisely because, as you said, different sets bring different info, it's useful to try to put it all together
      • (same as I, manually, might search site X's data to find a barcode and then use that barcode to find a release on amazon to find a back cover to find more data - just bottily :p)
      • luks
        I don't think it will every cherry-pick data like that
      • *ever
      • reosarevok
        Well, that's the obvious use of having multiple datasets coming in
      • So it would be a bit sad not to try to take advantage of it
      • luks
        realistically, you will be happy if you get album/title/artist
      • which is not even good for automatic import
      • reosarevok
        heh
      • I guess we're used to different data sources :p
      • reosarevok has been playing with the 70k albums in the naxos music library, and those are both fairly complete and fairly easy to match to other datasets for extra info
      • Of course, someone would have to ask them for the data, but I imagine that might actually work
      • I haven't seen label data for pop so I don't know how much that sucks