that stuff.. I know that tuff been using it since we got lucene search :)
oh ho
this i didn't know, i thouht it wasa completely different thing
samj1912
if lucene were bricks, solr is like a pre built house you can put your furniture in
the current search server we built from scratch
FishQuest
hmm
are you sure about that?
samj1912
that we built it from scratch?
FishQuest
the way I remember is, that lucene was added and tinkered with tremendously, but it also ,came fro msomething already built
this was oh.. wtf 7 20 years ago?
erh 10 not 20
samj1912
well, you get the point :P
FishQuest
anyway I'm going to the library <3, ping me when the test server can be logged into . (no rush or anything)
naught101_ joined the channel
D4RK-PH0ENiX has quit
Ant1SG has quit
D4RK-PH0ENiX joined the channel
yokel has quit
yokel joined the channel
Ant1SG joined the channel
Ant1SG has quit
naught101_ has quit
Ant1SG joined the channel
jesus2099 joined the channel
UmkaDK_ joined the channel
UmkaDK has quit
zas
bitmap: ping me when you're caffeined enough
Ant1SG has quit
MajorLurker has quit
gcilou joined the channel
ruaok
alastairp: the sharepoint download of all the files downloaded 20GB "successfully", but produces a corrupt zip file.
> 16455114579 extra bytes at beginning or within zipfile. zipfile corrupt.
samj1912
ruaok: took from 10:48 to 13:!3 to index all recordings
ruaok
oh wow. that is great.
samj1912
2:25 hrs around
oh wait, there's more, it ended on 13:49 sorry so about 3 hrs
ruaok
anything less than 6 hours is great. :)
alastairp
ruaok: I have URLs to download with curl, but internet here is rate limited during the day
samj1912
and zas pointed out that doing it over tcp has about 50-175% overhead depending on whether its ssl or not
ruaok
hit me. I got 300mbit ready to go!
samj1912
we figured we will move the slave to the same container and use sockets
zas is waiting for bitmap to figure out how to do it
and I dont think we have tuned the parameters enough yet
hopefully we should be able to get recording index down to 1 hr or 1.5 hrs
maybe less
me and zas were also discussing a ram only index if we want it really really quick in terms of indexing and retrieval, but it might be overkill :P since we have raid ssds
Sophist-UK has quit
jesus2099 has quit
UmkaDK_ has quit
UmkaDK joined the channel
Sophist-UK joined the channel
ruaok
samj1912: don't worry about tuning too much.
ideally we will do this only once.
samj1912
okay
ruaok
alastairp: thanks. Now downloading MLHD at ~30MB/s. :)
alastairp
incredible
ruaok
datacenter to datacenter FTW
then I'll shove this into BigQuery.
alastairp
we use google drive for the same reason to share stuff... enterprise file storage is way faster than the local internet connection
ruaok
5 files done already.
alastairp
I think it actually was a smart decision to put it on MS cloud, I guess McGill has an enterprise/academic account
UmkaDK has quit
samj1912
ruaok: entire indexing done except editors and cdstubs
took exactly 4 hours for everything
zas
but the whole point is to not reindex everything right ? how does it perform after one day of changes ?
alastairp
ruaok: just looking at the contents of the tar archives... no subdirectories, individual files are gzip compressed
samj1912
zas: not sure, haven't tested it yet
alastairp
might be worth writing a quick script to uncompress the archives and put them on disk in a nice structure
(or upload to BQ directly from the tar??)
samj1912
I need bitmap's help in adding the triggers
and setting up rabbitmq
alastairp
ruaok: btw, Felipe suggested https://airflow.apache.org/ as a tool for managing data from a local datastore -> BQ
might be something that we could look at if we're planning on sending data from lots of places
I've not looked at it yet, but I'm going to have a look at how it works
UmkaDK joined the channel
djwhitey joined the channel
djwhitey has quit
UmkaDK has quit
UmkaDK joined the channel
UmkaDK has quit
UmkaDK_ joined the channel
Gazooo joined the channel
Sophist-UK has quit
bitmap
zas: pong
Sophist-UK joined the channel
zas
hey
samj1912 made a test, using paco for sir/solr and williams as db