#musicbrainz

/

      • Tykling has left the channel
      • pbryan has left the channel
      • canidae
        bah, too much python lately... keep forgetting adding ";" in other language
      • s
      • aCiD2
        alright, bed time
      • natta!
      • aCiD2 has quit
      • Mirrakor has quit
      • canidae
        right... so "/etc/init.d/apache-perl restart" won't do the trick, i actually have to explicity stop and start for it to reload changes
      • FauxFaux
        canidae: Actually, it's worse than that, stop && start won't necessarily do it, I find a sleep in the middle helps. Make sure it prints the Preloading 230 components line. :p
      • canidae
        yes, i noticed it didn't do that on restart, it does on stop & start, though
      • anyways, i think i can nail this now as i actually can debug stuff now
      • i'm using an old computer for this, duron 1200, no need for sleep :p
      • although, raid5 kicks butt
      • it's disturbingly fast on i/o, compared to chewing numbers
      • yeapz, READWRITE probably shouldn't be undef when you set it up as RT_SLAVE
      • *READONLY
      • FauxFaux wonders how one would even attempt to architect a race condition like that. :/
      • since i just copied from READWRITE to READONLY it does have write access, but i'm the sole user, and i didn't set it up for changing data anyways
      • seemingly the documents doesn't tell you what to do if you want to set up a slave, but i might just've missed it too
      • outsidecontext has quit
      • outsidecontext joined the channel
      • outsidecontext has quit
      • outsidecontext joined the channel
      • baijiutong has quit
      • outsidecontext has quit
      • outsidecontext joined the channel
      • aww, come on.... ~125 rows/second? creating this index' gonna take a week
      • Kerensky97 has left the channel
      • ok, fine, only ~24 hours
      • ruaok
        thats a little slow. :-(
      • canidae
        1200 duron
      • ruaok
        how much ram?
      • canidae
        384mb, old box
      • don't have newer hardware to spare, really
      • ok, it picked up a little, ~190 rows/sec now
      • ruaok
        what kind of ram does it take?
      • canidae
        i've no idea... probably whatever was cool before ddr hit the market
      • ruaok
        pc-100 or pc-133
      • I still have some of that laying around...
      • canidae
        well, no stress... this isn't gonna be a permanent setup, it's just for testing. also, i mostly just have this week of vacation i where i got time/strength to play, so by the time it would come it'd probably be too late
      • i'm not so familiar with pylucene, although perhaps it could be an idea to look at clucene and make some bindings to python? iirc pylucene is more or less a hack, gcj compiled java code or some crack like that :p
      • meh, this was tedious to watch. maybe it'll be more exciting after some hours sleep. *poof*
      • dsp
        gcj compiled java lucene, C python bindings into that
      • last time i looked at clucene the featureset was weak
      • ruaok nods
      • xapian looked like it had improved a lot
      • one of these days i'll get around to trying it again
      • ruaok
        know of anyone who has compared it to lucene?
      • dsp
        not recently, but i haven't spent much time looking
      • i was just poking through their site a couple months ago and noticed that things had progressed a lot since i had last evaluated it
      • and while lucene is nice, java is not
      • ruaok
        ding!
      • dsp
        xapian would be a lot easier to deploy
      • ruaok
        which is the only reason why I'm even looking at it.
      • one should not core infrastructure pieces like search engines in java.
      • *write
      • dsp
        i don't know that c++ is much righter though
      • nikki
        is the index creating thing ram-dependent then?
      • ruaok
        dsp: its better than java.
      • index creation, no. searching is both heavy on cpu and on ram
      • fortunately someone is donating 2 servers with 8 gigs of ram and dual quad core procs
      • nikki
        oh, lucene indexes...
      • dsp
        nice
      • i've still been buying amd for search b/c of the extra memory bw
      • but it looks like intel is solving that shortly
      • nikki
        I should get around to buying more ram...
      • dsp
        at work tomorrow i have a machine showing up with 32 GB, yay
      • ruaok
        nice
      • nikki can't imagine that much
      • dsp
        ~4 yrs ago at a diff company i had itanics with 24 GB of mem :)
      • i can't believe the itanic still exists
      • nikki
        my macbook has a gig and that's the most I've ever had in one computer :P
      • dsp
        that *has* to be a political thing
      • for a laptop that is fine
      • for search you need a lot of mem if you have big indexes and want a lot of q/s
      • nikki nods
      • is it just me or is svn.musicbrainz.org s l o w?
      • ruaok
        its copying out the weekly data dump right now.
      • so you're only getting tablescaps of bandwidth right now.
      • dsp
        ah
      • pbryan joined the channel
      • outsidecontext has quit
      • zoke
        is there a time delay on the freedb gateway ?
      • zoke joined the channel
      • bmxgamer has quit
      • bmxgamer joined the channel
      • yllona joined the channel
      • baijiutong joined the channel
      • ruaok
        up to 70 mins
      • Muzzz
        e
      • zoke
        does barcode and ASIN do anything really ?
      • ruaok
        I think they both play jazz instruments in some band, IIRC
      • zoke
        I actually laughed
      • ruaok
        lon vs lol ?
      • sorry, loi
      • ruaok hasn't used loi much
      • zoke
        it was a little chuckle so I guess lol
      • I've added quite a bit of data today, hopefully someday in the future it will be come useful
      • ruaok
        I'm sure it will. :-)
      • dsp
        just spent some time playing around with xapian again
      • the indexes are still huge compared to lucene
      • ruaok
        I'm not so worried about that.
      • dsp
        the indexes end up 2x the source data
      • whereas with lucene they're more like 30% of source
      • ruaok
        ok, I can deal.
      • how is the index speed?
      • and the search speed?
      • dsp
        25k emails (127 MB) in ~ 3.5 minutes
      • 2 GHz athlon 64
      • ruaok
        ouch.
      • not good.
      • zoke
        is it slow because it is written in java ? or simply because it cannot handle those types of searches ?
      • dsp
        xapian is c++
      • i'm using it via the python bindings
      • java is actually pretty fast these days
      • assuming it is used well
      • ruaok
        java lucene is still quite a bit slower than pylucene with gcj
      • dsp
        have you actually measured that?
      • i saw ppl say that a lot, but it was never my experience
      • ruaok
        not qualitaively.
      • dsp
        xapian supposedly supports index updates better than lucene, fwiw
      • ruaok
        or at least not in a scientific fashion.
      • luks was never able to gety more than a fraction of the build speed that we get with gcj
      • dsp
        was he writing a competing java soln?
      • i never paid too much attn to index speed
      • ruaok
        solr?
      • dsp
        s/soln/solution/
      • sorry
      • ruaok
        ah.
      • dsp
        a lot of times i was using indexes that were built by java stuff
      • to prototype newer search features
      • ruaok
        no just using lucene/solr in java vs our own pylucene indexing stuff
      • dsp
        the search side of pylucene was never as fast as java
      • but i didn't spend much time looking into it
      • so it could have been a lot of things
      • ah
      • at pycon last weekend i learned of something called grassyknoll (http://rds.yahoo.com/_ylt=A0geu8o1hOBHW9MAsg9XN...)
      • grr yahoo
      • anyway, it is some wrapper for search crack for python
      • i haven't had a chance to look into it yet
      • ruaok thinks that search engines ought be written in proper compiled languages
      • yllona seconds that emotion
      • yllona
        interpretive languages aren't the best choice for search engines
      • dsp
        depends on what you're trying to do
      • i don't think there are any hard and fast rules
      • wth, pylucene now uses jcc?
      • i don't even know what that is
      • (as opposed to gcj)