all depends on what your searching or how your document is divided up. I've found that if you have a huge blob of text as one of the elements in your document Analyzer are good. If your document is well defined and has atomic fields (could almost go in a db) then it could be bad.
2004-10-27 30135, 2004
canidae has quit
2004-10-27 30143, 2004
ruaok
heh.
2004-10-27 30156, 2004
ruaok
well, I'm indexing MB's data that comes out of a DB and I'm still impressed.
may be easier to implement in lucene than I thought.
2004-10-27 30148, 2004
]Thread[
Soundex MetaPhone Refined Soundex and Double Metaphone
2004-10-27 30123, 2004
ruaok
bitchen!
2004-10-27 30132, 2004
ruaok
I'll probably make use of that!
2004-10-27 30135, 2004
]Thread[
encode it in a java routine and look up in perl?
2004-10-27 30151, 2004
ruaok
naw.
2004-10-27 30102, 2004
ruaok
remember the lucene web service I was just talking about?
2004-10-27 30109, 2004
ruaok
once its done its going to be open sourced.
2004-10-27 30110, 2004
]Thread[
yeah
2004-10-27 30117, 2004
]Thread[
cool
2004-10-27 30127, 2004
ruaok
so I will use it and have perl make calls to it to do searches.
2004-10-27 30138, 2004
ruaok
that way i can deploy it on its own standalone box.
2004-10-27 30102, 2004
ruaok
I'll be looking for a home for the lucene web service and I don't want to go to SF.
2004-10-27 30112, 2004
ruaok
any suggestions?
2004-10-27 30112, 2004
]Thread[
There has been talk of a Lucene server
2004-10-27 30130, 2004
ruaok
this is a just a servlet, but should be of some use to someone.
2004-10-27 30141, 2004
ruaok
i've got it mostly working now.
2004-10-27 30152, 2004
]Thread[
SF sucks?
2004-10-27 30157, 2004
ruaok
I gotta polish off some sharp edges today and then I'll let people poke at it tomorrow.
2004-10-27 30105, 2004
ruaok
SF sucks.
2004-10-27 30132, 2004
]Thread[
talking about SF. Have you seen gforge?
2004-10-27 30157, 2004
ruaok
I haven't looked in ages.
2004-10-27 30128, 2004
ruaok
do you think that gforce sucks less than SF?
2004-10-27 30141, 2004
]Thread[
latest is pretty good. I run it at my job to manage the developers. Since I can't afford SF's or Collabnets service fees.
2004-10-27 30128, 2004
]Thread[
well. It's got it's quirks. But atleast you not sharing the server with 20 thousand other orgs and companies
2004-10-27 30107, 2004
ruaok
does gforge host projects too?
2004-10-27 30118, 2004
]Thread[
well gforge is sourceforge open source
2004-10-27 30123, 2004
]Thread[
you just grab the php files
2004-10-27 30127, 2004
]Thread[
and setup the db
2004-10-27 30129, 2004
]Thread[
on postgres
2004-10-27 30131, 2004
]Thread[
adn host it yourself
2004-10-27 30132, 2004
ruaok
right.
2004-10-27 30136, 2004
]Thread[
yeah
2004-10-27 30146, 2004
ruaok
I certainly don't want to host it myself.
2004-10-27 30153, 2004
ruaok
I just want a place to put the lucene-ws.
2004-10-27 30155, 2004
]Thread[
they don't host projects.
2004-10-27 30101, 2004
ruaok
I guess SF it is.
2004-10-27 30106, 2004
]Thread[
:(
2004-10-27 30116, 2004
wheels joined the channel
2004-10-27 30142, 2004
MacIntire joined the channel
2004-10-27 30106, 2004
Nacho_ has quit
2004-10-27 30145, 2004
ruaok
wow. the TRM server must be dragging butt today.
2004-10-27 30153, 2004
ruaok
We got $90 in donations today.
2004-10-27 30104, 2004
ruaok
And thanks to _thom_ for making up $50 of that!
2004-10-27 30106, 2004
]Thread[ has quit
2004-10-27 30112, 2004
_thom_ waves
2004-10-27 30154, 2004
_thom_
It started because I was asking Jay Tuley to link against libmad instead of going through iTunes, so he could write the MBID directly into the file
2004-10-27 30129, 2004
_thom_
then I figured if I was going to do that, I ought to give him some beer money or something. Then I decided that since the whole thing depends on MB, I ought to make a matching donation :D
2004-10-27 30142, 2004
ruaok
cool!
2004-10-27 30150, 2004
ruaok
its really appreciated!
2004-10-27 30154, 2004
_thom_
sure, np
2004-10-27 30101, 2004
_thom_
So how is Picard coming, in the meantime ;)
2004-10-27 30111, 2004
ruaok
on hold for this week.
2004-10-27 30119, 2004
ruaok
I'm earning money to pay the bills. :-(
2004-10-27 30122, 2004
_thom_
Did that one kid end up doing anything to make it work on OS X?
2004-10-27 30127, 2004
_thom_
I hear that
2004-10-27 30135, 2004
ruaok
noting came from that yet.
2004-10-27 30151, 2004
_thom_
has he been around since, what, a week or so ago? That's when I talked to him last
2004-10-27 30154, 2004
ruaok
I'm hoping to work on picard more next week.
2004-10-27 30121, 2004
_thom_
Did you check out that thing I mentioned re: unicode support, though?
2004-10-27 30122, 2004
ruaok
I remember the conversation, but I don't remember the nick.
2004-10-27 30126, 2004
_thom_ either
2004-10-27 30139, 2004
ruaok
remind me on the unicode thing.
2004-10-27 30104, 2004
_thom_
Well, I went to the SF page for wxWidget and basically made a feature request (that unicode support work for python bindings on OS X)
2004-10-27 30116, 2004
cikkolata perks up at the mention of unicode
2004-10-27 30122, 2004
_thom_
and someone replied with a patch, but I compared it and it looked like it was already in the latest in CVS
2004-10-27 30147, 2004
_thom_
which I'd downloaded and built with all the required libraries, but it still failed the check for unicode in picard
2004-10-27 30120, 2004
ruaok
oh yes.
2004-10-27 30130, 2004
_thom_
for the heck of it I tried commenting it out. the app launched and I could pick files, etc, but it wouldn't identify tracks. it would, however, see the MB TRM info in files I'd done using tp_tagger, though
2004-10-27 30141, 2004
ruaok
Hmm.
2004-10-27 30143, 2004
ruaok
weird.
2004-10-27 30154, 2004
ruaok
not sure what to tell you...
2004-10-27 30112, 2004
_thom_
I guess I don't really care *what* interface I use, as long as it runs on a mac. The tp_tagger is just a little too clunky (gee, given that it's a demo, not surprising, right? :) to use to tag ALL of my music files
2004-10-27 30154, 2004
_thom_
that's why I like iEatBrainz, it has a decent enough UI (and works already) -- it just can't write to files directly, yet. Therefore, why I hope he'll link to libmad
2004-10-27 30142, 2004
_thom_
At that point I hope to write some wonky PHP + MySQL thing people could use to help catalog their music (PHP from command line), find dupes, tiebreaker based on encoding method or bit rate or etc etc
2004-10-27 30145, 2004
ruaok
cool.
2004-10-27 30109, 2004
ruaok
and picard will mature in time. its dicey now, since it depends on so much unstable stuff.
2004-10-27 30149, 2004
_thom_
We'll see what he says. If he's too busy to work on iEB, maybe I'll be back looking for some way to make tp_tagger a little more robust
2004-10-27 30117, 2004
ruaok
I can't believe people are using tp_tagger. :-)
2004-10-27 30118, 2004
_thom_
but for dealing with, you know, thousands of files -- most of which already have pretty decent track info -- ehh
2004-10-27 30124, 2004
ruaok
its a test harness for crying out loud!
2004-10-27 30127, 2004
_thom_
exactly, heh
2004-10-27 30131, 2004
ruaok waves at canidae
2004-10-27 30144, 2004
_thom_
I had this idea I wanted to ask you about
2004-10-27 30155, 2004
ruaok
See, picard excels at tagging files that have mostly clean data!
2004-10-27 30157, 2004
_thom_
from the *client* side, it's more cycles to take an audio fingerprint, right?
2004-10-27 30104, 2004
ruaok
yup.
2004-10-27 30114, 2004
ruaok
than not taking one, yes.
2004-10-27 30115, 2004
_thom_
but from the *server* side, it's more cycles to do a lookup based on artist name, album name, or track name, right?
2004-10-27 30119, 2004
_thom_
heh
2004-10-27 30130, 2004
_thom_
you see the tradeoff I'm trying to make, this comparision
2004-10-27 30136, 2004
ruaok
i do see.
2004-10-27 30140, 2004
ruaok
the answer is not as easy.
2004-10-27 30143, 2004
_thom_
Or is it equally difficult (or more so?) to search for a fingerprint ID
2004-10-27 30147, 2004
_thom_
on the server side
2004-10-27 30154, 2004
ruaok
One single TRM lookup on the server side is not expensive.
2004-10-27 30106, 2004
ruaok
But, we have a machine with 5GB of RAM to do that.
2004-10-27 30110, 2004
_thom_
heh
2004-10-27 30116, 2004
ruaok
And we can't replicate the TRM lookup onto new machines.
2004-10-27 30128, 2004
ruaok
since its closed source and the damn thing eats ram for lunch.
2004-10-27 30151, 2004
ruaok
so, a metadata only lookup is preferred, since it uses *our* open source tools that can be replicated.
2004-10-27 30152, 2004
_thom_
so your license restricts you to only using the actual 'black box' code on one machine?
2004-10-27 30100, 2004
ruaok
yup. :-(
2004-10-27 30107, 2004
ruaok
the answer to this is lucene.
2004-10-27 30112, 2004
_thom_
Oh! Got ya. So in other words, my lookup idea would actually be *really good* by comparison
2004-10-27 30120, 2004
ruaok
I've got a picard enabled tagger that uses lucene -- it kicks ASS!
2004-10-27 30123, 2004
_thom_
Yeah, I've overheard you guys talking about it
2004-10-27 30132, 2004
_thom_
but in the meantime, until it's fit for public consumption
So here is my crazy idea. Many free players can use CDDB or something like it, right?
2004-10-27 30143, 2004
ruaok
yup.
2004-10-27 30114, 2004
_thom_
Well, what if my php id3 reading program could look for those -- and if it found some evidence that this file had been ripped from CD, and looked up via a hash of a CD's index track...
2004-10-27 30131, 2004
_thom_
that would be much more conclusive proof that this was probably a track worth looking up via metadata instead of fingerprinting
2004-10-27 30146, 2004
_thom_
Besides, have you ever heard The Gourds' version of Gin & Juice?
2004-10-27 30154, 2004
_thom_
or King Missile's 'Detachable Penis'?
2004-10-27 30113, 2004
_thom_
Two songs which come readily to mind, which people mis-attribute ALL THE FRICKING TIME to the wrong bands