#musicbrainz

/

      • ruaok
        (not filter stop words, I guess?)
      • 2004-10-27 30102, 2004

      • ]Thread[
        all depends on what your searching or how your document is divided up. I've found that if you have a huge blob of text as one of the elements in your document Analyzer are good. If your document is well defined and has atomic fields (could almost go in a db) then it could be bad.
      • 2004-10-27 30135, 2004

      • canidae has quit
      • 2004-10-27 30143, 2004

      • ruaok
        heh.
      • 2004-10-27 30156, 2004

      • ruaok
        well, I'm indexing MB's data that comes out of a DB and I'm still impressed.
      • 2004-10-27 30102, 2004

      • ruaok
        did you look at search.musicbrainz.org ??
      • 2004-10-27 30121, 2004

      • ruaok
        my documents are basically this:
      • 2004-10-27 30129, 2004

      • ruaok
        artist:U2
      • 2004-10-27 30134, 2004

      • ruaok
        album:October
      • 2004-10-27 30137, 2004

      • ruaok
        track:gloria
      • 2004-10-27 30142, 2004

      • ruaok
        trackNum:1
      • 2004-10-27 30145, 2004

      • ruaok
        duration:343453
      • 2004-10-27 30156, 2004

      • ruaok
        then some ids, but that is mostly it.
      • 2004-10-27 30107, 2004

      • ruaok
        Probably not the best indexing setup for lucene...
      • 2004-10-27 30132, 2004

      • ]Thread[
        yeah. It's pretty cool.
      • 2004-10-27 30159, 2004

      • canidae joined the channel
      • 2004-10-27 30111, 2004

      • ]Thread[
        I'd Really like to mix up Lucene with a Memex or Soundex type algorithm
      • 2004-10-27 30136, 2004

      • ruaok
        rj_ wants to do exactly that -- metaphone2, I believe.
      • 2004-10-27 30132, 2004

      • ]Thread[
        using python with Lucene eh.
      • 2004-10-27 30138, 2004

      • ruaok
        yep.
      • 2004-10-27 30145, 2004

      • ruaok
        I was mainly just testing.
      • 2004-10-27 30103, 2004

      • ]Thread[
        some guys are reimplementing the google desktop in lucene
      • 2004-10-27 30110, 2004

      • ]Thread[
        so far they are saying it's faster.
      • 2004-10-27 30115, 2004

      • ruaok
        I'll be using pylucene on the client side for track identification and standard lucene in servlet inside tomcat on the server.
      • 2004-10-27 30123, 2004

      • ruaok
        hehehehhe
      • 2004-10-27 30138, 2004

      • ]Thread[
        but who cares if it's faster. BetaMax VHS :)
      • 2004-10-27 30117, 2004

      • ]Thread[
      • 2004-10-27 30127, 2004

      • ]Thread[
        may be easier to implement in lucene than I thought.
      • 2004-10-27 30148, 2004

      • ]Thread[
        Soundex MetaPhone Refined Soundex and Double Metaphone
      • 2004-10-27 30123, 2004

      • ruaok
        bitchen!
      • 2004-10-27 30132, 2004

      • ruaok
        I'll probably make use of that!
      • 2004-10-27 30135, 2004

      • ]Thread[
        encode it in a java routine and look up in perl?
      • 2004-10-27 30151, 2004

      • ruaok
        naw.
      • 2004-10-27 30102, 2004

      • ruaok
        remember the lucene web service I was just talking about?
      • 2004-10-27 30109, 2004

      • ruaok
        once its done its going to be open sourced.
      • 2004-10-27 30110, 2004

      • ]Thread[
        yeah
      • 2004-10-27 30117, 2004

      • ]Thread[
        cool
      • 2004-10-27 30127, 2004

      • ruaok
        so I will use it and have perl make calls to it to do searches.
      • 2004-10-27 30138, 2004

      • ruaok
        that way i can deploy it on its own standalone box.
      • 2004-10-27 30102, 2004

      • ruaok
        I'll be looking for a home for the lucene web service and I don't want to go to SF.
      • 2004-10-27 30112, 2004

      • ruaok
        any suggestions?
      • 2004-10-27 30112, 2004

      • ]Thread[
        There has been talk of a Lucene server
      • 2004-10-27 30130, 2004

      • ruaok
        this is a just a servlet, but should be of some use to someone.
      • 2004-10-27 30141, 2004

      • ruaok
        i've got it mostly working now.
      • 2004-10-27 30152, 2004

      • ]Thread[
        SF sucks?
      • 2004-10-27 30157, 2004

      • ruaok
        I gotta polish off some sharp edges today and then I'll let people poke at it tomorrow.
      • 2004-10-27 30105, 2004

      • ruaok
        SF sucks.
      • 2004-10-27 30132, 2004

      • ]Thread[
        talking about SF. Have you seen gforge?
      • 2004-10-27 30157, 2004

      • ruaok
        I haven't looked in ages.
      • 2004-10-27 30128, 2004

      • ruaok
        do you think that gforce sucks less than SF?
      • 2004-10-27 30141, 2004

      • ]Thread[
        latest is pretty good. I run it at my job to manage the developers. Since I can't afford SF's or Collabnets service fees.
      • 2004-10-27 30128, 2004

      • ]Thread[
        well. It's got it's quirks. But atleast you not sharing the server with 20 thousand other orgs and companies
      • 2004-10-27 30107, 2004

      • ruaok
        does gforge host projects too?
      • 2004-10-27 30118, 2004

      • ]Thread[
        well gforge is sourceforge open source
      • 2004-10-27 30123, 2004

      • ]Thread[
        you just grab the php files
      • 2004-10-27 30127, 2004

      • ]Thread[
        and setup the db
      • 2004-10-27 30129, 2004

      • ]Thread[
        on postgres
      • 2004-10-27 30131, 2004

      • ]Thread[
        adn host it yourself
      • 2004-10-27 30132, 2004

      • ruaok
        right.
      • 2004-10-27 30136, 2004

      • ]Thread[
        yeah
      • 2004-10-27 30146, 2004

      • ruaok
        I certainly don't want to host it myself.
      • 2004-10-27 30153, 2004

      • ruaok
        I just want a place to put the lucene-ws.
      • 2004-10-27 30155, 2004

      • ]Thread[
        they don't host projects.
      • 2004-10-27 30101, 2004

      • ruaok
        I guess SF it is.
      • 2004-10-27 30106, 2004

      • ]Thread[
        :(
      • 2004-10-27 30116, 2004

      • wheels joined the channel
      • 2004-10-27 30142, 2004

      • MacIntire joined the channel
      • 2004-10-27 30106, 2004

      • Nacho_ has quit
      • 2004-10-27 30145, 2004

      • ruaok
        wow. the TRM server must be dragging butt today.
      • 2004-10-27 30153, 2004

      • ruaok
        We got $90 in donations today.
      • 2004-10-27 30104, 2004

      • ruaok
        And thanks to _thom_ for making up $50 of that!
      • 2004-10-27 30106, 2004

      • ]Thread[ has quit
      • 2004-10-27 30112, 2004

      • _thom_ waves
      • 2004-10-27 30154, 2004

      • _thom_
        It started because I was asking Jay Tuley to link against libmad instead of going through iTunes, so he could write the MBID directly into the file
      • 2004-10-27 30129, 2004

      • _thom_
        then I figured if I was going to do that, I ought to give him some beer money or something. Then I decided that since the whole thing depends on MB, I ought to make a matching donation :D
      • 2004-10-27 30142, 2004

      • ruaok
        cool!
      • 2004-10-27 30150, 2004

      • ruaok
        its really appreciated!
      • 2004-10-27 30154, 2004

      • _thom_
        sure, np
      • 2004-10-27 30101, 2004

      • _thom_
        So how is Picard coming, in the meantime ;)
      • 2004-10-27 30111, 2004

      • ruaok
        on hold for this week.
      • 2004-10-27 30119, 2004

      • ruaok
        I'm earning money to pay the bills. :-(
      • 2004-10-27 30122, 2004

      • _thom_
        Did that one kid end up doing anything to make it work on OS X?
      • 2004-10-27 30127, 2004

      • _thom_
        I hear that
      • 2004-10-27 30135, 2004

      • ruaok
        noting came from that yet.
      • 2004-10-27 30151, 2004

      • _thom_
        has he been around since, what, a week or so ago? That's when I talked to him last
      • 2004-10-27 30154, 2004

      • ruaok
        I'm hoping to work on picard more next week.
      • 2004-10-27 30121, 2004

      • _thom_
        Did you check out that thing I mentioned re: unicode support, though?
      • 2004-10-27 30122, 2004

      • ruaok
        I remember the conversation, but I don't remember the nick.
      • 2004-10-27 30126, 2004

      • _thom_ either
      • 2004-10-27 30139, 2004

      • ruaok
        remind me on the unicode thing.
      • 2004-10-27 30104, 2004

      • _thom_
        Well, I went to the SF page for wxWidget and basically made a feature request (that unicode support work for python bindings on OS X)
      • 2004-10-27 30116, 2004

      • cikkolata perks up at the mention of unicode
      • 2004-10-27 30122, 2004

      • _thom_
        and someone replied with a patch, but I compared it and it looked like it was already in the latest in CVS
      • 2004-10-27 30147, 2004

      • _thom_
        which I'd downloaded and built with all the required libraries, but it still failed the check for unicode in picard
      • 2004-10-27 30120, 2004

      • ruaok
        oh yes.
      • 2004-10-27 30130, 2004

      • _thom_
        for the heck of it I tried commenting it out. the app launched and I could pick files, etc, but it wouldn't identify tracks. it would, however, see the MB TRM info in files I'd done using tp_tagger, though
      • 2004-10-27 30141, 2004

      • ruaok
        Hmm.
      • 2004-10-27 30143, 2004

      • ruaok
        weird.
      • 2004-10-27 30154, 2004

      • ruaok
        not sure what to tell you...
      • 2004-10-27 30112, 2004

      • _thom_
        I guess I don't really care *what* interface I use, as long as it runs on a mac. The tp_tagger is just a little too clunky (gee, given that it's a demo, not surprising, right? :) to use to tag ALL of my music files
      • 2004-10-27 30154, 2004

      • _thom_
        that's why I like iEatBrainz, it has a decent enough UI (and works already) -- it just can't write to files directly, yet. Therefore, why I hope he'll link to libmad
      • 2004-10-27 30142, 2004

      • _thom_
        At that point I hope to write some wonky PHP + MySQL thing people could use to help catalog their music (PHP from command line), find dupes, tiebreaker based on encoding method or bit rate or etc etc
      • 2004-10-27 30145, 2004

      • ruaok
        cool.
      • 2004-10-27 30109, 2004

      • ruaok
        and picard will mature in time. its dicey now, since it depends on so much unstable stuff.
      • 2004-10-27 30149, 2004

      • _thom_
        We'll see what he says. If he's too busy to work on iEB, maybe I'll be back looking for some way to make tp_tagger a little more robust
      • 2004-10-27 30117, 2004

      • ruaok
        I can't believe people are using tp_tagger. :-)
      • 2004-10-27 30118, 2004

      • _thom_
        but for dealing with, you know, thousands of files -- most of which already have pretty decent track info -- ehh
      • 2004-10-27 30124, 2004

      • ruaok
        its a test harness for crying out loud!
      • 2004-10-27 30127, 2004

      • _thom_
        exactly, heh
      • 2004-10-27 30131, 2004

      • ruaok waves at canidae
      • 2004-10-27 30144, 2004

      • _thom_
        I had this idea I wanted to ask you about
      • 2004-10-27 30155, 2004

      • ruaok
        See, picard excels at tagging files that have mostly clean data!
      • 2004-10-27 30157, 2004

      • _thom_
        from the *client* side, it's more cycles to take an audio fingerprint, right?
      • 2004-10-27 30104, 2004

      • ruaok
        yup.
      • 2004-10-27 30114, 2004

      • ruaok
        than not taking one, yes.
      • 2004-10-27 30115, 2004

      • _thom_
        but from the *server* side, it's more cycles to do a lookup based on artist name, album name, or track name, right?
      • 2004-10-27 30119, 2004

      • _thom_
        heh
      • 2004-10-27 30130, 2004

      • _thom_
        you see the tradeoff I'm trying to make, this comparision
      • 2004-10-27 30136, 2004

      • ruaok
        i do see.
      • 2004-10-27 30140, 2004

      • ruaok
        the answer is not as easy.
      • 2004-10-27 30143, 2004

      • _thom_
        Or is it equally difficult (or more so?) to search for a fingerprint ID
      • 2004-10-27 30147, 2004

      • _thom_
        on the server side
      • 2004-10-27 30154, 2004

      • ruaok
        One single TRM lookup on the server side is not expensive.
      • 2004-10-27 30106, 2004

      • ruaok
        But, we have a machine with 5GB of RAM to do that.
      • 2004-10-27 30110, 2004

      • _thom_
        heh
      • 2004-10-27 30116, 2004

      • ruaok
        And we can't replicate the TRM lookup onto new machines.
      • 2004-10-27 30128, 2004

      • ruaok
        since its closed source and the damn thing eats ram for lunch.
      • 2004-10-27 30151, 2004

      • ruaok
        so, a metadata only lookup is preferred, since it uses *our* open source tools that can be replicated.
      • 2004-10-27 30152, 2004

      • _thom_
        so your license restricts you to only using the actual 'black box' code on one machine?
      • 2004-10-27 30100, 2004

      • ruaok
        yup. :-(
      • 2004-10-27 30107, 2004

      • ruaok
        the answer to this is lucene.
      • 2004-10-27 30112, 2004

      • _thom_
        Oh! Got ya. So in other words, my lookup idea would actually be *really good* by comparison
      • 2004-10-27 30120, 2004

      • ruaok
        I've got a picard enabled tagger that uses lucene -- it kicks ASS!
      • 2004-10-27 30123, 2004

      • _thom_
        Yeah, I've overheard you guys talking about it
      • 2004-10-27 30132, 2004

      • _thom_
        but in the meantime, until it's fit for public consumption
      • 2004-10-27 30137, 2004

      • ruaok
        problem is that is requires a 650MB search index.
      • 2004-10-27 30145, 2004

      • _thom_
        heh
      • 2004-10-27 30152, 2004

      • _thom_
        Do you mean, on the *client* side?
      • 2004-10-27 30103, 2004

      • ruaok
        yup. :-)
      • 2004-10-27 30107, 2004

      • _thom_
        BitTorrent, baby
      • 2004-10-27 30124, 2004

      • ruaok
      • 2004-10-27 30135, 2004

      • _thom_
        So here is my crazy idea. Many free players can use CDDB or something like it, right?
      • 2004-10-27 30143, 2004

      • ruaok
        yup.
      • 2004-10-27 30114, 2004

      • _thom_
        Well, what if my php id3 reading program could look for those -- and if it found some evidence that this file had been ripped from CD, and looked up via a hash of a CD's index track...
      • 2004-10-27 30131, 2004

      • _thom_
        that would be much more conclusive proof that this was probably a track worth looking up via metadata instead of fingerprinting
      • 2004-10-27 30146, 2004

      • _thom_
        Besides, have you ever heard The Gourds' version of Gin & Juice?
      • 2004-10-27 30154, 2004

      • _thom_
        or King Missile's 'Detachable Penis'?
      • 2004-10-27 30113, 2004

      • _thom_
        Two songs which come readily to mind, which people mis-attribute ALL THE FRICKING TIME to the wrong bands
      • 2004-10-27 30124, 2004

      • ruaok
        :-)
      • 2004-10-27 30132, 2004

      • ruaok
        I've heard an acoustic version of G&J