#musicbrainz

/

      • Tykling has left the channel
      • 2008-03-19 07943, 2008

      • pbryan has left the channel
      • 2008-03-19 07913, 2008

      • canidae
        bah, too much python lately... keep forgetting adding ";" in other language
      • 2008-03-19 07914, 2008

      • canidae
        s
      • 2008-03-19 07928, 2008

      • aCiD2
        alright, bed time
      • 2008-03-19 07929, 2008

      • aCiD2
        natta!
      • 2008-03-19 07932, 2008

      • aCiD2 has quit
      • 2008-03-19 07958, 2008

      • Mirrakor has quit
      • 2008-03-19 07923, 2008

      • canidae
        right... so "/etc/init.d/apache-perl restart" won't do the trick, i actually have to explicity stop and start for it to reload changes
      • 2008-03-19 07926, 2008

      • FauxFaux
        canidae: Actually, it's worse than that, stop && start won't necessarily do it, I find a sleep in the middle helps. Make sure it prints the Preloading 230 components line. :p
      • 2008-03-19 07944, 2008

      • canidae
        yes, i noticed it didn't do that on restart, it does on stop & start, though
      • 2008-03-19 07906, 2008

      • canidae
        anyways, i think i can nail this now as i actually can debug stuff now
      • 2008-03-19 07952, 2008

      • canidae
        i'm using an old computer for this, duron 1200, no need for sleep :p
      • 2008-03-19 07957, 2008

      • canidae
        although, raid5 kicks butt
      • 2008-03-19 07926, 2008

      • canidae
        it's disturbingly fast on i/o, compared to chewing numbers
      • 2008-03-19 07956, 2008

      • canidae
        yeapz, READWRITE probably shouldn't be undef when you set it up as RT_SLAVE
      • 2008-03-19 07928, 2008

      • canidae
        *READONLY
      • 2008-03-19 07923, 2008

      • FauxFaux wonders how one would even attempt to architect a race condition like that. :/
      • 2008-03-19 07952, 2008

      • canidae
        since i just copied from READWRITE to READONLY it does have write access, but i'm the sole user, and i didn't set it up for changing data anyways
      • 2008-03-19 07930, 2008

      • canidae
        seemingly the documents doesn't tell you what to do if you want to set up a slave, but i might just've missed it too
      • 2008-03-19 07940, 2008

      • outsidecontext has quit
      • 2008-03-19 07909, 2008

      • outsidecontext joined the channel
      • 2008-03-19 07949, 2008

      • outsidecontext has quit
      • 2008-03-19 07912, 2008

      • outsidecontext joined the channel
      • 2008-03-19 07924, 2008

      • baijiutong has quit
      • 2008-03-19 07954, 2008

      • outsidecontext has quit
      • 2008-03-19 07921, 2008

      • outsidecontext joined the channel
      • 2008-03-19 07919, 2008

      • canidae
        aww, come on.... ~125 rows/second? creating this index' gonna take a week
      • 2008-03-19 07951, 2008

      • Kerensky97 has left the channel
      • 2008-03-19 07909, 2008

      • canidae
        ok, fine, only ~24 hours
      • 2008-03-19 07924, 2008

      • ruaok
        thats a little slow. :-(
      • 2008-03-19 07932, 2008

      • canidae
        1200 duron
      • 2008-03-19 07936, 2008

      • ruaok
        how much ram?
      • 2008-03-19 07958, 2008

      • canidae
        384mb, old box
      • 2008-03-19 07905, 2008

      • canidae
        don't have newer hardware to spare, really
      • 2008-03-19 07903, 2008

      • canidae
        ok, it picked up a little, ~190 rows/sec now
      • 2008-03-19 07906, 2008

      • ruaok
        what kind of ram does it take?
      • 2008-03-19 07946, 2008

      • canidae
        i've no idea... probably whatever was cool before ddr hit the market
      • 2008-03-19 07902, 2008

      • ruaok
        pc-100 or pc-133
      • 2008-03-19 07909, 2008

      • ruaok
        I still have some of that laying around...
      • 2008-03-19 07957, 2008

      • canidae
        well, no stress... this isn't gonna be a permanent setup, it's just for testing. also, i mostly just have this week of vacation i where i got time/strength to play, so by the time it would come it'd probably be too late
      • 2008-03-19 07922, 2008

      • canidae
        i'm not so familiar with pylucene, although perhaps it could be an idea to look at clucene and make some bindings to python? iirc pylucene is more or less a hack, gcj compiled java code or some crack like that :p
      • 2008-03-19 07926, 2008

      • canidae
        meh, this was tedious to watch. maybe it'll be more exciting after some hours sleep. *poof*
      • 2008-03-19 07914, 2008

      • dsp
        gcj compiled java lucene, C python bindings into that
      • 2008-03-19 07934, 2008

      • dsp
        last time i looked at clucene the featureset was weak
      • 2008-03-19 07909, 2008

      • ruaok nods
      • 2008-03-19 07924, 2008

      • dsp
        xapian looked like it had improved a lot
      • 2008-03-19 07930, 2008

      • dsp
        one of these days i'll get around to trying it again
      • 2008-03-19 07932, 2008

      • ruaok
        know of anyone who has compared it to lucene?
      • 2008-03-19 07913, 2008

      • dsp
        not recently, but i haven't spent much time looking
      • 2008-03-19 07935, 2008

      • dsp
        i was just poking through their site a couple months ago and noticed that things had progressed a lot since i had last evaluated it
      • 2008-03-19 07946, 2008

      • dsp
        and while lucene is nice, java is not
      • 2008-03-19 07953, 2008

      • ruaok
        ding!
      • 2008-03-19 07954, 2008

      • dsp
        xapian would be a lot easier to deploy
      • 2008-03-19 07908, 2008

      • ruaok
        which is the only reason why I'm even looking at it.
      • 2008-03-19 07936, 2008

      • ruaok
        one should not core infrastructure pieces like search engines in java.
      • 2008-03-19 07944, 2008

      • ruaok
        *write
      • 2008-03-19 07951, 2008

      • dsp
        i don't know that c++ is much righter though
      • 2008-03-19 07955, 2008

      • nikki
        is the index creating thing ram-dependent then?
      • 2008-03-19 07909, 2008

      • ruaok
        dsp: its better than java.
      • 2008-03-19 07928, 2008

      • ruaok
        index creation, no. searching is both heavy on cpu and on ram
      • 2008-03-19 07949, 2008

      • ruaok
        fortunately someone is donating 2 servers with 8 gigs of ram and dual quad core procs
      • 2008-03-19 07936, 2008

      • nikki
        oh, lucene indexes...
      • 2008-03-19 07952, 2008

      • dsp
        nice
      • 2008-03-19 07926, 2008

      • dsp
        i've still been buying amd for search b/c of the extra memory bw
      • 2008-03-19 07937, 2008

      • dsp
        but it looks like intel is solving that shortly
      • 2008-03-19 07952, 2008

      • nikki
        I should get around to buying more ram...
      • 2008-03-19 07947, 2008

      • dsp
        at work tomorrow i have a machine showing up with 32 GB, yay
      • 2008-03-19 07912, 2008

      • ruaok
        nice
      • 2008-03-19 07921, 2008

      • nikki can't imagine that much
      • 2008-03-19 07926, 2008

      • dsp
        ~4 yrs ago at a diff company i had itanics with 24 GB of mem :)
      • 2008-03-19 07942, 2008

      • dsp
        i can't believe the itanic still exists
      • 2008-03-19 07949, 2008

      • nikki
        my macbook has a gig and that's the most I've ever had in one computer :P
      • 2008-03-19 07954, 2008

      • dsp
        that *has* to be a political thing
      • 2008-03-19 07938, 2008

      • dsp
        for a laptop that is fine
      • 2008-03-19 07952, 2008

      • dsp
        for search you need a lot of mem if you have big indexes and want a lot of q/s
      • 2008-03-19 07905, 2008

      • nikki nods
      • 2008-03-19 07916, 2008

      • dsp
        is it just me or is svn.musicbrainz.org s l o w?
      • 2008-03-19 07938, 2008

      • ruaok
        its copying out the weekly data dump right now.
      • 2008-03-19 07955, 2008

      • ruaok
        so you're only getting tablescaps of bandwidth right now.
      • 2008-03-19 07955, 2008

      • dsp
        ah
      • 2008-03-19 07925, 2008

      • pbryan joined the channel
      • 2008-03-19 07903, 2008

      • outsidecontext has quit
      • 2008-03-19 07911, 2008

      • zoke
        is there a time delay on the freedb gateway ?
      • 2008-03-19 07912, 2008

      • zoke joined the channel
      • 2008-03-19 07947, 2008

      • bmxgamer has quit
      • 2008-03-19 07958, 2008

      • bmxgamer joined the channel
      • 2008-03-19 07935, 2008

      • yllona joined the channel
      • 2008-03-19 07945, 2008

      • baijiutong joined the channel
      • 2008-03-19 07915, 2008

      • ruaok
        up to 70 mins
      • 2008-03-19 07906, 2008

      • Muzzz
        e
      • 2008-03-19 07919, 2008

      • zoke
        does barcode and ASIN do anything really ?
      • 2008-03-19 07934, 2008

      • ruaok
        I think they both play jazz instruments in some band, IIRC
      • 2008-03-19 07924, 2008

      • zoke
        I actually laughed
      • 2008-03-19 07952, 2008

      • ruaok
        lon vs lol ?
      • 2008-03-19 07901, 2008

      • ruaok
        sorry, loi
      • 2008-03-19 07906, 2008

      • ruaok hasn't used loi much
      • 2008-03-19 07903, 2008

      • ruaok
      • 2008-03-19 07901, 2008

      • zoke
        it was a little chuckle so I guess lol
      • 2008-03-19 07936, 2008

      • zoke
        I've added quite a bit of data today, hopefully someday in the future it will be come useful
      • 2008-03-19 07938, 2008

      • ruaok
        I'm sure it will. :-)
      • 2008-03-19 07941, 2008

      • dsp
        just spent some time playing around with xapian again
      • 2008-03-19 07946, 2008

      • dsp
        the indexes are still huge compared to lucene
      • 2008-03-19 07904, 2008

      • ruaok
        I'm not so worried about that.
      • 2008-03-19 07904, 2008

      • dsp
        the indexes end up 2x the source data
      • 2008-03-19 07913, 2008

      • dsp
        whereas with lucene they're more like 30% of source
      • 2008-03-19 07923, 2008

      • ruaok
        ok, I can deal.
      • 2008-03-19 07927, 2008

      • ruaok
        how is the index speed?
      • 2008-03-19 07931, 2008

      • ruaok
        and the search speed?
      • 2008-03-19 07959, 2008

      • dsp
        25k emails (127 MB) in ~ 3.5 minutes
      • 2008-03-19 07905, 2008

      • dsp
        2 GHz athlon 64
      • 2008-03-19 07911, 2008

      • ruaok
        ouch.
      • 2008-03-19 07913, 2008

      • ruaok
        not good.
      • 2008-03-19 07900, 2008

      • zoke
        is it slow because it is written in java ? or simply because it cannot handle those types of searches ?
      • 2008-03-19 07907, 2008

      • dsp
        xapian is c++
      • 2008-03-19 07912, 2008

      • dsp
        i'm using it via the python bindings
      • 2008-03-19 07930, 2008

      • dsp
        java is actually pretty fast these days
      • 2008-03-19 07940, 2008

      • dsp
        assuming it is used well
      • 2008-03-19 07954, 2008

      • ruaok
        java lucene is still quite a bit slower than pylucene with gcj
      • 2008-03-19 07920, 2008

      • dsp
        have you actually measured that?
      • 2008-03-19 07929, 2008

      • dsp
        i saw ppl say that a lot, but it was never my experience
      • 2008-03-19 07933, 2008

      • ruaok
        not qualitaively.
      • 2008-03-19 07946, 2008

      • dsp
        xapian supposedly supports index updates better than lucene, fwiw
      • 2008-03-19 07956, 2008

      • ruaok
        or at least not in a scientific fashion.
      • 2008-03-19 07915, 2008

      • ruaok
        luks was never able to gety more than a fraction of the build speed that we get with gcj
      • 2008-03-19 07943, 2008

      • dsp
        was he writing a competing java soln?
      • 2008-03-19 07912, 2008

      • dsp
        i never paid too much attn to index speed
      • 2008-03-19 07917, 2008

      • ruaok
        solr?
      • 2008-03-19 07922, 2008

      • dsp
        s/soln/solution/
      • 2008-03-19 07923, 2008

      • dsp
        sorry
      • 2008-03-19 07927, 2008

      • ruaok
        ah.
      • 2008-03-19 07937, 2008

      • dsp
        a lot of times i was using indexes that were built by java stuff
      • 2008-03-19 07943, 2008

      • dsp
        to prototype newer search features
      • 2008-03-19 07947, 2008

      • ruaok
        no just using lucene/solr in java vs our own pylucene indexing stuff
      • 2008-03-19 07954, 2008

      • dsp
        the search side of pylucene was never as fast as java
      • 2008-03-19 07901, 2008

      • dsp
        but i didn't spend much time looking into it
      • 2008-03-19 07905, 2008

      • dsp
        so it could have been a lot of things
      • 2008-03-19 07918, 2008

      • dsp
        ah
      • 2008-03-19 07908, 2008

      • dsp
        at pycon last weekend i learned of something called grassyknoll (http://rds.yahoo.com/_ylt=A0geu8o1hOBHW9MAsg9XNyo…)
      • 2008-03-19 07916, 2008

      • dsp
        grr yahoo
      • 2008-03-19 07928, 2008

      • dsp
        anyway, it is some wrapper for search crack for python
      • 2008-03-19 07939, 2008

      • dsp
        i haven't had a chance to look into it yet
      • 2008-03-19 07919, 2008

      • ruaok thinks that search engines ought be written in proper compiled languages
      • 2008-03-19 07952, 2008

      • yllona seconds that emotion
      • 2008-03-19 07930, 2008

      • yllona
        interpretive languages aren't the best choice for search engines
      • 2008-03-19 07956, 2008

      • dsp
        depends on what you're trying to do
      • 2008-03-19 07905, 2008

      • dsp
        i don't think there are any hard and fast rules
      • 2008-03-19 07916, 2008

      • dsp
        wth, pylucene now uses jcc?
      • 2008-03-19 07920, 2008

      • dsp
        i don't even know what that is
      • 2008-03-19 07927, 2008

      • dsp
        (as opposed to gcj)