#metabrainz

/

      • FishQuest
      • 2017-11-07 31133, 2017

      • FishQuest sees an error on one description
      • 2017-11-07 31131, 2017

      • samj1912
        FishQuest: hmm, seems like dismax isn't one
      • 2017-11-07 31132, 2017

      • samj1912
        *on
      • 2017-11-07 31134, 2017

      • samj1912
        weird
      • 2017-11-07 31139, 2017

      • samj1912
      • 2017-11-07 31142, 2017

      • samj1912
        sorry
      • 2017-11-07 31144, 2017

      • culinko has quit
      • 2017-11-07 31103, 2017

      • FishQuest
        eehhh
      • 2017-11-07 31104, 2017

      • FishQuest
        ?
      • 2017-11-07 31153, 2017

      • FishQuest
        dismax=true does nothing on test.mb
      • 2017-11-07 31105, 2017

      • FishQuest
        or is it url hack?
      • 2017-11-07 31151, 2017

      • samj1912
      • 2017-11-07 31125, 2017

      • FishQuest
      • 2017-11-07 31132, 2017

      • FishQuest
        ah the ~ 
      • 2017-11-07 31136, 2017

      • FishQuest
        uh
      • 2017-11-07 31151, 2017

      • FishQuest
        can't you give me an actaly search link?
      • 2017-11-07 31123, 2017

      • FishQuest
      • 2017-11-07 31147, 2017

      • samj1912
        yup
      • 2017-11-07 31116, 2017

      • FishQuest
        hmm, this was old functionality, when the indexed search first got turned on. it required things like ~ to be abl ot get similar results
      • 2017-11-07 31123, 2017

      • FishQuest
        i odn't remember whne it jsut beasme defualt
      • 2017-11-07 31127, 2017

      • FishQuest
        ugh
      • 2017-11-07 31135, 2017

      • samj1912
        FishQuest: still testing things
      • 2017-11-07 31141, 2017

      • FishQuest
        i don't remember whne it just became defualt
      • 2017-11-07 31146, 2017

      • FishQuest
        sam <3 :)
      • 2017-11-07 31149, 2017

      • samj1912
        search server isn't fully integrated with the ui yet
      • 2017-11-07 31100, 2017

      • FishQuest
        yep
      • 2017-11-07 31103, 2017

      • samj1912
      • 2017-11-07 31107, 2017

      • FishQuest
        eh
      • 2017-11-07 31108, 2017

      • samj1912
        for more cool stuff you can do
      • 2017-11-07 31124, 2017

      • FishQuest
        i thought this was the solr search?
      • 2017-11-07 31154, 2017

      • samj1912
        yes, solr is built on lucene
      • 2017-11-07 31155, 2017

      • FishQuest
        that stuff.. I know that tuff been using it since we got lucene search :)
      • 2017-11-07 31158, 2017

      • FishQuest
        oh ho
      • 2017-11-07 31109, 2017

      • FishQuest
        this i didn't know, i thouht it wasa completely different thing
      • 2017-11-07 31141, 2017

      • samj1912
        if lucene were bricks, solr is like a pre built house you can put your furniture in
      • 2017-11-07 31149, 2017

      • samj1912
        the current search server we built from scratch
      • 2017-11-07 31155, 2017

      • FishQuest
        hmm
      • 2017-11-07 31119, 2017

      • FishQuest
        are you sure about that?
      • 2017-11-07 31135, 2017

      • samj1912
        that we built it from scratch?
      • 2017-11-07 31144, 2017

      • FishQuest
        the way I remember is, that lucene was added and tinkered with tremendously, but it also ,came fro msomething already built
      • 2017-11-07 31154, 2017

      • FishQuest
        this was oh.. wtf 7 20 years ago?
      • 2017-11-07 31100, 2017

      • FishQuest
        erh 10 not 20
      • 2017-11-07 31111, 2017

      • samj1912
        well, you get the point :P
      • 2017-11-07 31153, 2017

      • FishQuest
        anyway I'm going to the library <3, ping me when the test server can be logged into . (no rush or anything)
      • 2017-11-07 31127, 2017

      • naught101_ joined the channel
      • 2017-11-07 31145, 2017

      • D4RK-PH0ENiX has quit
      • 2017-11-07 31134, 2017

      • Ant1SG has quit
      • 2017-11-07 31155, 2017

      • D4RK-PH0ENiX joined the channel
      • 2017-11-07 31118, 2017

      • yokel has quit
      • 2017-11-07 31132, 2017

      • yokel joined the channel
      • 2017-11-07 31117, 2017

      • Ant1SG joined the channel
      • 2017-11-07 31105, 2017

      • Ant1SG has quit
      • 2017-11-07 31159, 2017

      • naught101_ has quit
      • 2017-11-07 31142, 2017

      • Ant1SG joined the channel
      • 2017-11-07 31148, 2017

      • jesus2099 joined the channel
      • 2017-11-07 31150, 2017

      • UmkaDK_ joined the channel
      • 2017-11-07 31112, 2017

      • UmkaDK has quit
      • 2017-11-07 31102, 2017

      • zas
        bitmap: ping me when you're caffeined enough
      • 2017-11-07 31133, 2017

      • Ant1SG has quit
      • 2017-11-07 31106, 2017

      • MajorLurker has quit
      • 2017-11-07 31132, 2017

      • gcilou joined the channel
      • 2017-11-07 31146, 2017

      • ruaok
        alastairp: the sharepoint download of all the files downloaded 20GB "successfully", but produces a corrupt zip file.
      • 2017-11-07 31118, 2017

      • ruaok
        > 16455114579 extra bytes at beginning or within zipfile. zipfile corrupt.
      • 2017-11-07 31132, 2017

      • samj1912
        ruaok: took from 10:48 to 13:!3 to index all recordings
      • 2017-11-07 31151, 2017

      • ruaok
        oh wow. that is great.
      • 2017-11-07 31113, 2017

      • samj1912
        2:25 hrs around
      • 2017-11-07 31138, 2017

      • samj1912
        oh wait, there's more, it ended on 13:49 sorry so about 3 hrs
      • 2017-11-07 31154, 2017

      • ruaok
        anything less than 6 hours is great. :)
      • 2017-11-07 31128, 2017

      • alastairp
        ruaok: I have URLs to download with curl, but internet here is rate limited during the day
      • 2017-11-07 31144, 2017

      • samj1912
        and zas pointed out that doing it over tcp has about 50-175% overhead depending on whether its ssl or not
      • 2017-11-07 31144, 2017

      • ruaok
        hit me. I got 300mbit ready to go!
      • 2017-11-07 31100, 2017

      • samj1912
        we figured we will move the slave to the same container and use sockets
      • 2017-11-07 31113, 2017

      • samj1912
        zas is waiting for bitmap to figure out how to do it
      • 2017-11-07 31159, 2017

      • samj1912
        and I dont think we have tuned the parameters enough yet
      • 2017-11-07 31119, 2017

      • samj1912
        hopefully we should be able to get recording index down to 1 hr or 1.5 hrs
      • 2017-11-07 31125, 2017

      • samj1912
        maybe less
      • 2017-11-07 31100, 2017

      • samj1912
        me and zas were also discussing a ram only index if we want it really really quick in terms of indexing and retrieval, but it might be overkill :P since we have raid ssds
      • 2017-11-07 31135, 2017

      • Sophist-UK has quit
      • 2017-11-07 31123, 2017

      • jesus2099 has quit
      • 2017-11-07 31136, 2017

      • UmkaDK_ has quit
      • 2017-11-07 31159, 2017

      • UmkaDK joined the channel
      • 2017-11-07 31152, 2017

      • Sophist-UK joined the channel
      • 2017-11-07 31110, 2017

      • ruaok
        samj1912: don't worry about tuning too much.
      • 2017-11-07 31117, 2017

      • ruaok
        ideally we will do this only once.
      • 2017-11-07 31123, 2017

      • samj1912
        okay
      • 2017-11-07 31145, 2017

      • ruaok
        alastairp: thanks. Now downloading MLHD at ~30MB/s. :)
      • 2017-11-07 31153, 2017

      • alastairp
        incredible
      • 2017-11-07 31104, 2017

      • ruaok
        datacenter to datacenter FTW
      • 2017-11-07 31115, 2017

      • ruaok
        then I'll shove this into BigQuery.
      • 2017-11-07 31138, 2017

      • alastairp
        we use google drive for the same reason to share stuff... enterprise file storage is way faster than the local internet connection
      • 2017-11-07 31142, 2017

      • ruaok
        5 files done already.
      • 2017-11-07 31117, 2017

      • alastairp
        I think it actually was a smart decision to put it on MS cloud, I guess McGill has an enterprise/academic account
      • 2017-11-07 31121, 2017

      • UmkaDK has quit
      • 2017-11-07 31139, 2017

      • samj1912
        ruaok: entire indexing done except editors and cdstubs
      • 2017-11-07 31149, 2017

      • samj1912
        took exactly 4 hours for everything
      • 2017-11-07 31127, 2017

      • zas
        but the whole point is to not reindex everything right ? how does it perform after one day of changes ?
      • 2017-11-07 31137, 2017

      • alastairp
        ruaok: just looking at the contents of the tar archives... no subdirectories, individual files are gzip compressed
      • 2017-11-07 31101, 2017

      • samj1912
        zas: not sure, haven't tested it yet
      • 2017-11-07 31104, 2017

      • alastairp
        might be worth writing a quick script to uncompress the archives and put them on disk in a nice structure
      • 2017-11-07 31113, 2017

      • alastairp
        (or upload to BQ directly from the tar??)
      • 2017-11-07 31130, 2017

      • samj1912
        I need bitmap's help in adding the triggers
      • 2017-11-07 31138, 2017

      • samj1912
        and setting up rabbitmq
      • 2017-11-07 31117, 2017

      • alastairp
        ruaok: btw, Felipe suggested https://airflow.apache.org/ as a tool for managing data from a local datastore -> BQ
      • 2017-11-07 31143, 2017

      • alastairp
        might be something that we could look at if we're planning on sending data from lots of places
      • 2017-11-07 31102, 2017

      • alastairp
        I've not looked at it yet, but I'm going to have a look at how it works
      • 2017-11-07 31107, 2017

      • UmkaDK joined the channel
      • 2017-11-07 31102, 2017

      • djwhitey joined the channel
      • 2017-11-07 31109, 2017

      • djwhitey has quit
      • 2017-11-07 31123, 2017

      • UmkaDK has quit
      • 2017-11-07 31132, 2017

      • UmkaDK joined the channel
      • 2017-11-07 31102, 2017

      • UmkaDK has quit
      • 2017-11-07 31108, 2017

      • UmkaDK_ joined the channel
      • 2017-11-07 31141, 2017

      • Gazooo joined the channel
      • 2017-11-07 31137, 2017

      • Sophist-UK has quit
      • 2017-11-07 31118, 2017

      • bitmap
        zas: pong
      • 2017-11-07 31137, 2017

      • Sophist-UK joined the channel
      • 2017-11-07 31150, 2017

      • zas
        hey
      • 2017-11-07 31126, 2017

      • zas
        samj1912 made a test, using paco for sir/solr and williams as db
      • 2017-11-07 31115, 2017

      • zas
      • 2017-11-07 31108, 2017

      • zas
      • 2017-11-07 31143, 2017

      • zas
        i think we could get a huge speed up using a db slave on the same machine as sir/solr
      • 2017-11-07 31102, 2017

      • zas
        and even further using unix socket instead of tcp to query the db
      • 2017-11-07 31136, 2017

      • gcilou has quit
      • 2017-11-07 31138, 2017

      • zas
        the whole indexing took something like 4 hours, but with significant network activity, and that's prolly the bottleneck, any thoughts ?
      • 2017-11-07 31134, 2017

      • zas
        bitmap: ^^
      • 2017-11-07 31158, 2017

      • bitmap
        hmm
      • 2017-11-07 31120, 2017

      • bitmap
        I don't know how the indexing works tbh, I would expect it to query the data and build indexes concurrently
      • 2017-11-07 31136, 2017

      • bitmap
        building the initial indexes that is
      • 2017-11-07 31120, 2017

      • bitmap
        we could try it and see what effect it has
      • 2017-11-07 31130, 2017

      • zas
        yes :)
      • 2017-11-07 31155, 2017

      • zas
        samj1912: does sir need write access to the db ?
      • 2017-11-07 31104, 2017

      • samj1912
        nope
      • 2017-11-07 31127, 2017

      • bitmap
        for sir, it could query a slave, but the amqp triggers have to be on the master db
      • 2017-11-07 31149, 2017

      • zas
        is this an issue ?
      • 2017-11-07 31101, 2017

      • zas
        slave & master should be more or less in sync
      • 2017-11-07 31115, 2017

      • bitmap
        I don't think so, there should be little to no lag
      • 2017-11-07 31134, 2017

      • zas
        let's give it a try then, a dedicated slave db running on the same host, with unix socket access (through containers mounts)
      • 2017-11-07 31148, 2017

      • bitmap
        and there will probably always be things in the queue, so the lag wouldn't actually matter
      • 2017-11-07 31107, 2017

      • jsturgis joined the channel
      • 2017-11-07 31134, 2017

      • zas
        samj1912: step 1, move sir from paco to williams for tests
      • 2017-11-07 31148, 2017

      • bitmap
        can we just try with queen?
      • 2017-11-07 31126, 2017

      • zas
        if you feel it is safe, i'm ok with it
      • 2017-11-07 31153, 2017

      • zas
        in fact, that's my idea, for production
      • 2017-11-07 31121, 2017

      • zas
        queen is fast, has the db slave, and ssd everywhere
      • 2017-11-07 31122, 2017

      • bitmap
        we were testing sir on the production db before, without issues, so I think it should be fine
      • 2017-11-07 31131, 2017

      • zas
        another thing: we start to lack of space on serge, most diskspace is used by ftp, i think williams would be a much better host for it, no urge though
      • 2017-11-07 31110, 2017

      • bitmap
        makes sense
      • 2017-11-07 31137, 2017

      • jsturgis has quit
      • 2017-11-07 31121, 2017

      • samj1912
        bitmap: how can I test the live indexing part