reosarevok, samj1912: i guess you are too young to know about M$ relationship to open source, it was a time they were trying everything to kill it. And they failed, but it doesn't mean they ever stopped to try. M$ being open source friendly is a joke that only young devs can buy. M$ changed, bla bla bla. This company tried everything to kill linux, you don't remember, but i do, and i'll not forget, whatever they say now.
I do remember. That's before they realized they can make more money this way :p
2018-06-04 15511, 2018
zas
samj1912: yes, CEO of M$.
2018-06-04 15528, 2018
samj1912
Yes, and he is no longer the CEO
2018-06-04 15552, 2018
D4RK-PH0ENiX joined the channel
2018-06-04 15527, 2018
reosarevok
I mean, I did grow up with M$ and all, but right now I don't feel they're worse than other big company in the IT world honestly. And if they do change the stuff, then we already have a Bitbucket repo, don't we?
2018-06-04 15539, 2018
reosarevok
Anyway, we can talk that during the meeting :)
2018-06-04 15542, 2018
samj1912
Yup
2018-06-04 15554, 2018
zas
I guess i'll not convince you to stay away from those. But about open source orgs "panicking", you're wrong, they don't panick, they just know what will happen.
2018-06-04 15516, 2018
samj1912
zas, so I tried to testing all the collections
2018-06-04 15530, 2018
samj1912
Release and recording the heartbreakers :p
2018-06-04 15537, 2018
samj1912
The rest of them scale fine
2018-06-04 15554, 2018
samj1912
But those are the only two that have a lot of hits and currently can't keep up
2018-06-04 15505, 2018
zas
Did you tune caching ?
2018-06-04 15520, 2018
samj1912
I am not really sure how to
2018-06-04 15528, 2018
samj1912
I added one sure
2018-06-04 15534, 2018
samj1912
Didn't help much
2018-06-04 15500, 2018
samj1912
I even conducted tests if our mb-solr response writer was the culprit
2018-06-04 15519, 2018
samj1912
But at max it has a 10% overhead across all cores
2018-06-04 15525, 2018
samj1912
Which is to be excepted
2018-06-04 15532, 2018
Lotheric has quit
2018-06-04 15540, 2018
samj1912
Solr should perform much better and it can
2018-06-04 15551, 2018
samj1912
But the release and recording responses are just big
2018-06-04 15559, 2018
Lotheric joined the channel
2018-06-04 15511, 2018
samj1912
The document retrieval isn't a problem
2018-06-04 15529, 2018
samj1912
I tested it out with release groups, which is almost the same size index as release
2018-06-04 15546, 2018
samj1912
But release group performs atleast 5 6 times better
2018-06-04 15547, 2018
D4RK-PH0ENiX has quit
2018-06-04 15559, 2018
samj1912
Release responses are just huge
2018-06-04 15504, 2018
zas
i guess it has to do with fields, and overall indexed data structure
2018-06-04 15505, 2018
samj1912
Ditto for recording
2018-06-04 15527, 2018
samj1912
Well zas, we use it sort of differently
2018-06-04 15537, 2018
samj1912
There are indexed fields for search
2018-06-04 15552, 2018
samj1912
But for document display each document has a _store field
2018-06-04 15513, 2018
samj1912
Which contains an xml string
2018-06-04 15537, 2018
samj1912
This is unmarshelled into a Java object
2018-06-04 15529, 2018
samj1912
And then marshaled into mbxml/mbjson responses
2018-06-04 15555, 2018
zas
hmmm, doesn't look great at first glance, are they alternatives to that ?
2018-06-04 15514, 2018
samj1912
I haven't thought about it
2018-06-04 15546, 2018
samj1912
But it will be tough to have something to keep compatible with our changing schema if we don't do it that way
2018-06-04 15515, 2018
samj1912
The bindings are basically generated from mmd-schema directly
2018-06-04 15546, 2018
samj1912
And solr's default xml/json out putter won't output it in the exact same way as WS2
2018-06-04 15533, 2018
zas
i'd like to have bitmap's thoughts about this stuff
2018-06-04 15526, 2018
samj1912
Cool
2018-06-04 15535, 2018
samj1912
We can discuss this after the meeting then
2018-06-04 15541, 2018
samj1912
Or before?
2018-06-04 15542, 2018
zas
what's in the _store field exactly ?
2018-06-04 15504, 2018
samj1912
zas, the exact xml response you get from WS2 for a document
2018-06-04 15515, 2018
D4RK-PH0ENiX joined the channel
2018-06-04 15557, 2018
zas
what is filling it ?
2018-06-04 15506, 2018
samj1912
SIR
2018-06-04 15516, 2018
zas
and it is stored in SOLR?
2018-06-04 15520, 2018
samj1912
Yup
2018-06-04 15531, 2018
samj1912
Stored, not indexed
2018-06-04 15532, 2018
zas
hmmm, but it isn't used for searches, right ?
2018-06-04 15546, 2018
samj1912
Nope
2018-06-04 15552, 2018
samj1912
Just for display
2018-06-04 15543, 2018
zas
so it means if it wasn't stored in SOLR (but an id) everything will be much lighter ?
2018-06-04 15552, 2018
samj1912
Definitely
2018-06-04 15502, 2018
samj1912
That would be a breeze for solr to handle
2018-06-04 15518, 2018
samj1912
And sir.
2018-06-04 15530, 2018
zas
and what does return the final answer ?
2018-06-04 15546, 2018
zas
solrwriter ?
2018-06-04 15556, 2018
samj1912
Nope, solr
2018-06-04 15507, 2018
samj1912
But it passes the docs through solr writer
2018-06-04 15526, 2018
samj1912
As I said, solr writer is not the problem
2018-06-04 15540, 2018
samj1912
The mere amount of information in _store field is too much
2018-06-04 15500, 2018
samj1912
So I tested with the json out putter of solr
2018-06-04 15524, 2018
samj1912
Just outputting the document score and the _store field without converting it to its proper structure
2018-06-04 15534, 2018
samj1912
The overhead was 10% for solr writer
2018-06-04 15549, 2018
samj1912
Gets lesser as the reqs/s go up
2018-06-04 15552, 2018
zas
nope, but storing data in solr is not really recommended: https://wiki.apache.org/solr/SolrPerformanceProbl… says "Don't store all your fields, especially the really big ones. Instead, have your application retrieve detail data from the original data source, not Solr. " (but i'm not sure it applies here)
2018-06-04 15505, 2018
samj1912
Yeah
2018-06-04 15509, 2018
samj1912
I know about that
2018-06-04 15513, 2018
ephemer0l_ joined the channel
2018-06-04 15556, 2018
samj1912
I would like to know how the existing search works
2018-06-04 15507, 2018
samj1912
Does it return the docs or just the ids
2018-06-04 15525, 2018
samj1912
I am guessing the docs since we just plugged in solr instead of search
2018-06-04 15503, 2018
samj1912
IMHO, perl layer should be the layer of abstraction and solr should just return mbids given a search query
2018-06-04 15516, 2018
samj1912
And scores obviously
2018-06-04 15534, 2018
zas
i tend to agree
2018-06-04 15500, 2018
samj1912
I can easily make the changes for this in solr in a day
2018-06-04 15501, 2018
zas
atm, SIR is creating the xml data to store in _store field ?
2018-06-04 15509, 2018
samj1912
Not sure how to handle it in mbs
2018-06-04 15517, 2018
samj1912
zas, yes
2018-06-04 15530, 2018
samj1912
Which is what takes it so long to index things
2018-06-04 15534, 2018
zas
so basically it could generate a hash, and store the data in a file
2018-06-04 15544, 2018
samj1912
zas, sure
2018-06-04 15553, 2018
samj1912
But keep in mind the data is updated regularly
2018-06-04 15500, 2018
zas
and we could have this hash in the _store field, and read from the file at response time
2018-06-04 15508, 2018
samj1912
Sure
2018-06-04 15521, 2018
samj1912
Not sure what to do about the json stuff though
2018-06-04 15558, 2018
samj1912
SIR doesn't create the json responses
2018-06-04 15500, 2018
zas
can you get the size of all _store fields compared to the size of search indexes ?
2018-06-04 15508, 2018
samj1912
Hmm
2018-06-04 15528, 2018
zas
SIR creates only xml ? and then it's converted to json ?
2018-06-04 15535, 2018
samj1912
Yup
2018-06-04 15558, 2018
samj1912
But it's not a lot of overhead vs xml
2018-06-04 15514, 2018
samj1912
Since it stores xml as a string
2018-06-04 15526, 2018
samj1912
It is then converted to a Java object
2018-06-04 15536, 2018
samj1912
And then respective xml/json responses
2018-06-04 15537, 2018
zas
but we could do the reverse, especially since we'll prolly deprecate xml output at some point
2018-06-04 15555, 2018
zas
and json is usually more compact
2018-06-04 15507, 2018
samj1912
zas, not sure if we can
2018-06-04 15519, 2018
samj1912
Mmd-schema basically defines the xml schema
2018-06-04 15536, 2018
samj1912
And that is what is used to create all the bindings
2018-06-04 15546, 2018
samj1912
Converting to json requires a lot of adapters
2018-06-04 15524, 2018
zas
ok, this stuff isn't really crystal clear to me yet
2018-06-04 15527, 2018
samj1912
Even in WS I imagine a similar process is used, which is why the json output is sometimes inconsistent with the xml one
2018-06-04 15500, 2018
samj1912
bitmap should be able to explain the mbs side better