#musicbrainz

/

      • ruaok
        (not filter stop words, I guess?)
      • ]Thread[
        all depends on what your searching or how your document is divided up. I've found that if you have a huge blob of text as one of the elements in your document Analyzer are good. If your document is well defined and has atomic fields (could almost go in a db) then it could be bad.
      • canidae has quit
      • ruaok
        heh.
      • well, I'm indexing MB's data that comes out of a DB and I'm still impressed.
      • did you look at search.musicbrainz.org ??
      • my documents are basically this:
      • artist:U2
      • album:October
      • track:gloria
      • trackNum:1
      • duration:343453
      • then some ids, but that is mostly it.
      • Probably not the best indexing setup for lucene...
      • ]Thread[
        yeah. It's pretty cool.
      • canidae joined the channel
      • I'd Really like to mix up Lucene with a Memex or Soundex type algorithm
      • ruaok
        rj_ wants to do exactly that -- metaphone2, I believe.
      • ]Thread[
        using python with Lucene eh.
      • ruaok
        yep.
      • I was mainly just testing.
      • ]Thread[
        some guys are reimplementing the google desktop in lucene
      • so far they are saying it's faster.
      • ruaok
        I'll be using pylucene on the client side for track identification and standard lucene in servlet inside tomcat on the server.
      • hehehehhe
      • ]Thread[
        but who cares if it's faster. BetaMax VHS :)
      • may be easier to implement in lucene than I thought.
      • Soundex MetaPhone Refined Soundex and Double Metaphone
      • ruaok
        bitchen!
      • I'll probably make use of that!
      • ]Thread[
        encode it in a java routine and look up in perl?
      • ruaok
        naw.
      • remember the lucene web service I was just talking about?
      • once its done its going to be open sourced.
      • ]Thread[
        yeah
      • cool
      • ruaok
        so I will use it and have perl make calls to it to do searches.
      • that way i can deploy it on its own standalone box.
      • I'll be looking for a home for the lucene web service and I don't want to go to SF.
      • any suggestions?
      • ]Thread[
        There has been talk of a Lucene server
      • ruaok
        this is a just a servlet, but should be of some use to someone.
      • i've got it mostly working now.
      • ]Thread[
        SF sucks?
      • ruaok
        I gotta polish off some sharp edges today and then I'll let people poke at it tomorrow.
      • SF sucks.
      • ]Thread[
        talking about SF. Have you seen gforge?
      • ruaok
        I haven't looked in ages.
      • do you think that gforce sucks less than SF?
      • ]Thread[
        latest is pretty good. I run it at my job to manage the developers. Since I can't afford SF's or Collabnets service fees.
      • well. It's got it's quirks. But atleast you not sharing the server with 20 thousand other orgs and companies
      • ruaok
        does gforge host projects too?
      • ]Thread[
        well gforge is sourceforge open source
      • you just grab the php files
      • and setup the db
      • on postgres
      • adn host it yourself
      • ruaok
        right.
      • ]Thread[
        yeah
      • ruaok
        I certainly don't want to host it myself.
      • I just want a place to put the lucene-ws.
      • ]Thread[
        they don't host projects.
      • ruaok
        I guess SF it is.
      • ]Thread[
        :(
      • wheels joined the channel
      • MacIntire joined the channel
      • Nacho_ has quit
      • ruaok
        wow. the TRM server must be dragging butt today.
      • We got $90 in donations today.
      • And thanks to _thom_ for making up $50 of that!
      • ]Thread[ has quit
      • _thom_ waves
      • _thom_
        It started because I was asking Jay Tuley to link against libmad instead of going through iTunes, so he could write the MBID directly into the file
      • then I figured if I was going to do that, I ought to give him some beer money or something. Then I decided that since the whole thing depends on MB, I ought to make a matching donation :D
      • ruaok
        cool!
      • its really appreciated!
      • _thom_
        sure, np
      • So how is Picard coming, in the meantime ;)
      • ruaok
        on hold for this week.
      • I'm earning money to pay the bills. :-(
      • _thom_
        Did that one kid end up doing anything to make it work on OS X?
      • I hear that
      • ruaok
        noting came from that yet.
      • _thom_
        has he been around since, what, a week or so ago? That's when I talked to him last
      • ruaok
        I'm hoping to work on picard more next week.
      • _thom_
        Did you check out that thing I mentioned re: unicode support, though?
      • ruaok
        I remember the conversation, but I don't remember the nick.
      • _thom_ either
      • remind me on the unicode thing.
      • _thom_
        Well, I went to the SF page for wxWidget and basically made a feature request (that unicode support work for python bindings on OS X)
      • cikkolata perks up at the mention of unicode
      • and someone replied with a patch, but I compared it and it looked like it was already in the latest in CVS
      • which I'd downloaded and built with all the required libraries, but it still failed the check for unicode in picard
      • ruaok
        oh yes.
      • _thom_
        for the heck of it I tried commenting it out. the app launched and I could pick files, etc, but it wouldn't identify tracks. it would, however, see the MB TRM info in files I'd done using tp_tagger, though
      • ruaok
        Hmm.
      • weird.
      • not sure what to tell you...
      • _thom_
        I guess I don't really care *what* interface I use, as long as it runs on a mac. The tp_tagger is just a little too clunky (gee, given that it's a demo, not surprising, right? :) to use to tag ALL of my music files
      • that's why I like iEatBrainz, it has a decent enough UI (and works already) -- it just can't write to files directly, yet. Therefore, why I hope he'll link to libmad
      • At that point I hope to write some wonky PHP + MySQL thing people could use to help catalog their music (PHP from command line), find dupes, tiebreaker based on encoding method or bit rate or etc etc
      • ruaok
        cool.
      • and picard will mature in time. its dicey now, since it depends on so much unstable stuff.
      • _thom_
        We'll see what he says. If he's too busy to work on iEB, maybe I'll be back looking for some way to make tp_tagger a little more robust
      • ruaok
        I can't believe people are using tp_tagger. :-)
      • _thom_
        but for dealing with, you know, thousands of files -- most of which already have pretty decent track info -- ehh
      • ruaok
        its a test harness for crying out loud!
      • _thom_
        exactly, heh
      • ruaok waves at canidae
      • I had this idea I wanted to ask you about
      • ruaok
        See, picard excels at tagging files that have mostly clean data!
      • _thom_
        from the *client* side, it's more cycles to take an audio fingerprint, right?
      • ruaok
        yup.
      • than not taking one, yes.
      • _thom_
        but from the *server* side, it's more cycles to do a lookup based on artist name, album name, or track name, right?
      • heh
      • you see the tradeoff I'm trying to make, this comparision
      • ruaok
        i do see.
      • the answer is not as easy.
      • _thom_
        Or is it equally difficult (or more so?) to search for a fingerprint ID
      • on the server side
      • ruaok
        One single TRM lookup on the server side is not expensive.
      • But, we have a machine with 5GB of RAM to do that.
      • _thom_
        heh
      • ruaok
        And we can't replicate the TRM lookup onto new machines.
      • since its closed source and the damn thing eats ram for lunch.
      • so, a metadata only lookup is preferred, since it uses *our* open source tools that can be replicated.
      • _thom_
        so your license restricts you to only using the actual 'black box' code on one machine?
      • ruaok
        yup. :-(
      • the answer to this is lucene.
      • _thom_
        Oh! Got ya. So in other words, my lookup idea would actually be *really good* by comparison
      • ruaok
        I've got a picard enabled tagger that uses lucene -- it kicks ASS!
      • _thom_
        Yeah, I've overheard you guys talking about it
      • but in the meantime, until it's fit for public consumption
      • ruaok
        problem is that is requires a 650MB search index.
      • _thom_
        heh
      • Do you mean, on the *client* side?
      • ruaok
        yup. :-)
      • _thom_
        BitTorrent, baby
      • ruaok
      • _thom_
        So here is my crazy idea. Many free players can use CDDB or something like it, right?
      • ruaok
        yup.
      • _thom_
        Well, what if my php id3 reading program could look for those -- and if it found some evidence that this file had been ripped from CD, and looked up via a hash of a CD's index track...
      • that would be much more conclusive proof that this was probably a track worth looking up via metadata instead of fingerprinting
      • Besides, have you ever heard The Gourds' version of Gin & Juice?
      • or King Missile's 'Detachable Penis'?
      • Two songs which come readily to mind, which people mis-attribute ALL THE FRICKING TIME to the wrong bands
      • ruaok
        :-)
      • I've heard an acoustic version of G&J