all depends on what your searching or how your document is divided up. I've found that if you have a huge blob of text as one of the elements in your document Analyzer are good. If your document is well defined and has atomic fields (could almost go in a db) then it could be bad.
canidae has quit
ruaok
heh.
well, I'm indexing MB's data that comes out of a DB and I'm still impressed.
may be easier to implement in lucene than I thought.
Soundex MetaPhone Refined Soundex and Double Metaphone
ruaok
bitchen!
I'll probably make use of that!
]Thread[
encode it in a java routine and look up in perl?
ruaok
naw.
remember the lucene web service I was just talking about?
once its done its going to be open sourced.
]Thread[
yeah
cool
ruaok
so I will use it and have perl make calls to it to do searches.
that way i can deploy it on its own standalone box.
I'll be looking for a home for the lucene web service and I don't want to go to SF.
any suggestions?
]Thread[
There has been talk of a Lucene server
ruaok
this is a just a servlet, but should be of some use to someone.
i've got it mostly working now.
]Thread[
SF sucks?
ruaok
I gotta polish off some sharp edges today and then I'll let people poke at it tomorrow.
SF sucks.
]Thread[
talking about SF. Have you seen gforge?
ruaok
I haven't looked in ages.
do you think that gforce sucks less than SF?
]Thread[
latest is pretty good. I run it at my job to manage the developers. Since I can't afford SF's or Collabnets service fees.
well. It's got it's quirks. But atleast you not sharing the server with 20 thousand other orgs and companies
ruaok
does gforge host projects too?
]Thread[
well gforge is sourceforge open source
you just grab the php files
and setup the db
on postgres
adn host it yourself
ruaok
right.
]Thread[
yeah
ruaok
I certainly don't want to host it myself.
I just want a place to put the lucene-ws.
]Thread[
they don't host projects.
ruaok
I guess SF it is.
]Thread[
:(
wheels joined the channel
MacIntire joined the channel
Nacho_ has quit
ruaok
wow. the TRM server must be dragging butt today.
We got $90 in donations today.
And thanks to _thom_ for making up $50 of that!
]Thread[ has quit
_thom_ waves
_thom_
It started because I was asking Jay Tuley to link against libmad instead of going through iTunes, so he could write the MBID directly into the file
then I figured if I was going to do that, I ought to give him some beer money or something. Then I decided that since the whole thing depends on MB, I ought to make a matching donation :D
ruaok
cool!
its really appreciated!
_thom_
sure, np
So how is Picard coming, in the meantime ;)
ruaok
on hold for this week.
I'm earning money to pay the bills. :-(
_thom_
Did that one kid end up doing anything to make it work on OS X?
I hear that
ruaok
noting came from that yet.
_thom_
has he been around since, what, a week or so ago? That's when I talked to him last
ruaok
I'm hoping to work on picard more next week.
_thom_
Did you check out that thing I mentioned re: unicode support, though?
ruaok
I remember the conversation, but I don't remember the nick.
_thom_ either
remind me on the unicode thing.
_thom_
Well, I went to the SF page for wxWidget and basically made a feature request (that unicode support work for python bindings on OS X)
cikkolata perks up at the mention of unicode
and someone replied with a patch, but I compared it and it looked like it was already in the latest in CVS
which I'd downloaded and built with all the required libraries, but it still failed the check for unicode in picard
ruaok
oh yes.
_thom_
for the heck of it I tried commenting it out. the app launched and I could pick files, etc, but it wouldn't identify tracks. it would, however, see the MB TRM info in files I'd done using tp_tagger, though
ruaok
Hmm.
weird.
not sure what to tell you...
_thom_
I guess I don't really care *what* interface I use, as long as it runs on a mac. The tp_tagger is just a little too clunky (gee, given that it's a demo, not surprising, right? :) to use to tag ALL of my music files
that's why I like iEatBrainz, it has a decent enough UI (and works already) -- it just can't write to files directly, yet. Therefore, why I hope he'll link to libmad
At that point I hope to write some wonky PHP + MySQL thing people could use to help catalog their music (PHP from command line), find dupes, tiebreaker based on encoding method or bit rate or etc etc
ruaok
cool.
and picard will mature in time. its dicey now, since it depends on so much unstable stuff.
_thom_
We'll see what he says. If he's too busy to work on iEB, maybe I'll be back looking for some way to make tp_tagger a little more robust
ruaok
I can't believe people are using tp_tagger. :-)
_thom_
but for dealing with, you know, thousands of files -- most of which already have pretty decent track info -- ehh
ruaok
its a test harness for crying out loud!
_thom_
exactly, heh
ruaok waves at canidae
I had this idea I wanted to ask you about
ruaok
See, picard excels at tagging files that have mostly clean data!
_thom_
from the *client* side, it's more cycles to take an audio fingerprint, right?
ruaok
yup.
than not taking one, yes.
_thom_
but from the *server* side, it's more cycles to do a lookup based on artist name, album name, or track name, right?
heh
you see the tradeoff I'm trying to make, this comparision
ruaok
i do see.
the answer is not as easy.
_thom_
Or is it equally difficult (or more so?) to search for a fingerprint ID
on the server side
ruaok
One single TRM lookup on the server side is not expensive.
But, we have a machine with 5GB of RAM to do that.
_thom_
heh
ruaok
And we can't replicate the TRM lookup onto new machines.
since its closed source and the damn thing eats ram for lunch.
so, a metadata only lookup is preferred, since it uses *our* open source tools that can be replicated.
_thom_
so your license restricts you to only using the actual 'black box' code on one machine?
ruaok
yup. :-(
the answer to this is lucene.
_thom_
Oh! Got ya. So in other words, my lookup idea would actually be *really good* by comparison
ruaok
I've got a picard enabled tagger that uses lucene -- it kicks ASS!
_thom_
Yeah, I've overheard you guys talking about it
but in the meantime, until it's fit for public consumption
So here is my crazy idea. Many free players can use CDDB or something like it, right?
ruaok
yup.
_thom_
Well, what if my php id3 reading program could look for those -- and if it found some evidence that this file had been ripped from CD, and looked up via a hash of a CD's index track...
that would be much more conclusive proof that this was probably a track worth looking up via metadata instead of fingerprinting
Besides, have you ever heard The Gourds' version of Gin & Juice?
or King Missile's 'Detachable Penis'?
Two songs which come readily to mind, which people mis-attribute ALL THE FRICKING TIME to the wrong bands