#metabrainz

/

11:49 AM
samj1912

Anyway. zas, you around?

2018-06-04 15506, 2018

11:52 AM
zas

reosarevok, samj1912: i guess you are too young to know about M$ relationship to open source, it was a time they were trying everything to kill it. And they failed, but it doesn't mean they ever stopped to try. M$ being open source friendly is a joke that only young devs can buy. M$ changed, bla bla bla. This company tried everything to kill linux, you don't remember, but i do, and i'll not forget, whatever they say now.

2018-06-04 15511, 2018

11:52 AM
zas

https://www.theregister.co.uk/2001/06/02/ballmer_…

2018-06-04 15550, 2018

11:52 AM
zas

His position now, after they failed to kill it: https://www.zdnet.com/article/ballmer-i-may-have-…

2018-06-04 15553, 2018

11:52 AM
samj1912

Well that was Steve Balmer :p

2018-06-04 15556, 2018

11:52 AM
reosarevok

I do remember. That's before they realized they can make more money this way :p

2018-06-04 15511, 2018

11:53 AM
zas

samj1912: yes, CEO of M$.

2018-06-04 15528, 2018

11:53 AM
samj1912

Yes, and he is no longer the CEO

2018-06-04 15552, 2018

11:53 AM
D4RK-PH0ENiX joined the channel

2018-06-04 15527, 2018

11:54 AM
reosarevok

I mean, I did grow up with M$ and all, but right now I don't feel they're worse than other big company in the IT world honestly. And if they do change the stuff, then we already have a Bitbucket repo, don't we?

2018-06-04 15539, 2018

11:54 AM
reosarevok

Anyway, we can talk that during the meeting :)

2018-06-04 15542, 2018

11:54 AM
samj1912

Yup

2018-06-04 15554, 2018

11:54 AM
zas

I guess i'll not convince you to stay away from those. But about open source orgs "panicking", you're wrong, they don't panick, they just know what will happen.

2018-06-04 15516, 2018

11:55 AM
samj1912

zas, so I tried to testing all the collections

2018-06-04 15530, 2018

11:55 AM
samj1912

Release and recording the heartbreakers :p

2018-06-04 15537, 2018

11:55 AM
samj1912

The rest of them scale fine

2018-06-04 15554, 2018

11:55 AM
samj1912

But those are the only two that have a lot of hits and currently can't keep up

2018-06-04 15505, 2018

11:56 AM
zas

Did you tune caching ?

2018-06-04 15520, 2018

11:56 AM
samj1912

I am not really sure how to

2018-06-04 15528, 2018

11:56 AM
samj1912

I added one sure

2018-06-04 15534, 2018

11:56 AM
samj1912

Didn't help much

2018-06-04 15500, 2018

11:57 AM
samj1912

I even conducted tests if our mb-solr response writer was the culprit

2018-06-04 15519, 2018

11:57 AM
samj1912

But at max it has a 10% overhead across all cores

2018-06-04 15525, 2018

11:57 AM
samj1912

Which is to be excepted

2018-06-04 15532, 2018

11:57 AM
Lotheric has quit

2018-06-04 15540, 2018

11:57 AM
samj1912

Solr should perform much better and it can

2018-06-04 15551, 2018

11:57 AM
samj1912

But the release and recording responses are just big

2018-06-04 15559, 2018

11:57 AM
Lotheric joined the channel

2018-06-04 15511, 2018

11:58 AM
samj1912

The document retrieval isn't a problem

2018-06-04 15529, 2018

11:58 AM
samj1912

I tested it out with release groups, which is almost the same size index as release

2018-06-04 15546, 2018

11:58 AM
samj1912

But release group performs atleast 5 6 times better

2018-06-04 15547, 2018

11:58 AM
D4RK-PH0ENiX has quit

2018-06-04 15559, 2018

11:58 AM
samj1912

Release responses are just huge

2018-06-04 15504, 2018

11:59 AM
zas

i guess it has to do with fields, and overall indexed data structure

2018-06-04 15505, 2018

11:59 AM
samj1912

Ditto for recording

2018-06-04 15527, 2018

11:59 AM
samj1912

Well zas, we use it sort of differently

2018-06-04 15537, 2018

11:59 AM
samj1912

There are indexed fields for search

2018-06-04 15552, 2018

11:59 AM
samj1912

But for document display each document has a _store field

2018-06-04 15513, 2018

12:00 PM
samj1912

Which contains an xml string

2018-06-04 15537, 2018

12:00 PM
samj1912

This is unmarshelled into a Java object

2018-06-04 15529, 2018

12:01 PM
samj1912

And then marshaled into mbxml/mbjson responses

2018-06-04 15555, 2018

12:01 PM
zas

hmmm, doesn't look great at first glance, are they alternatives to that ?

2018-06-04 15514, 2018

12:02 PM
samj1912

I haven't thought about it

2018-06-04 15546, 2018

12:02 PM
samj1912

But it will be tough to have something to keep compatible with our changing schema if we don't do it that way

2018-06-04 15515, 2018

12:03 PM
samj1912

The bindings are basically generated from mmd-schema directly

2018-06-04 15546, 2018

12:03 PM
samj1912

And solr's default xml/json out putter won't output it in the exact same way as WS2

2018-06-04 15533, 2018

12:04 PM
zas

i'd like to have bitmap's thoughts about this stuff

2018-06-04 15526, 2018

12:06 PM
samj1912

Cool

2018-06-04 15535, 2018

12:06 PM
samj1912

We can discuss this after the meeting then

2018-06-04 15541, 2018

12:06 PM
samj1912

Or before?

2018-06-04 15542, 2018

12:07 PM
zas

what's in the _store field exactly ?

2018-06-04 15504, 2018

12:08 PM
samj1912

zas, the exact xml response you get from WS2 for a document

2018-06-04 15515, 2018

12:08 PM
D4RK-PH0ENiX joined the channel

2018-06-04 15557, 2018

12:08 PM
zas

what is filling it ?

2018-06-04 15506, 2018

12:09 PM
samj1912

SIR

2018-06-04 15516, 2018

12:09 PM
zas

and it is stored in SOLR?

2018-06-04 15520, 2018

12:09 PM
samj1912

Yup

2018-06-04 15531, 2018

12:09 PM
samj1912

Stored, not indexed

2018-06-04 15532, 2018

12:09 PM
zas

hmmm, but it isn't used for searches, right ?

2018-06-04 15546, 2018

12:09 PM
samj1912

Nope

2018-06-04 15552, 2018

12:09 PM
samj1912

Just for display

2018-06-04 15543, 2018

12:10 PM
zas

so it means if it wasn't stored in SOLR (but an id) everything will be much lighter ?

2018-06-04 15552, 2018

12:10 PM
samj1912

Definitely

2018-06-04 15502, 2018

12:11 PM
samj1912

That would be a breeze for solr to handle

2018-06-04 15518, 2018

12:11 PM
samj1912

And sir.

2018-06-04 15530, 2018

12:11 PM
zas

and what does return the final answer ?

2018-06-04 15546, 2018

12:11 PM
zas

solrwriter ?

2018-06-04 15556, 2018

12:11 PM
samj1912

Nope, solr

2018-06-04 15507, 2018

12:12 PM
samj1912

But it passes the docs through solr writer

2018-06-04 15526, 2018

12:12 PM
samj1912

As I said, solr writer is not the problem

2018-06-04 15540, 2018

12:12 PM
samj1912

The mere amount of information in _store field is too much

2018-06-04 15500, 2018

12:13 PM
samj1912

So I tested with the json out putter of solr

2018-06-04 15524, 2018

12:13 PM
samj1912

Just outputting the document score and the _store field without converting it to its proper structure

2018-06-04 15534, 2018

12:13 PM
samj1912

The overhead was 10% for solr writer

2018-06-04 15549, 2018

12:13 PM
samj1912

Gets lesser as the reqs/s go up

2018-06-04 15552, 2018

12:13 PM
zas

nope, but storing data in solr is not really recommended: https://wiki.apache.org/solr/SolrPerformanceProbl… says "Don't store all your fields, especially the really big ones. Instead, have your application retrieve detail data from the original data source, not Solr. " (but i'm not sure it applies here)

2018-06-04 15505, 2018

12:14 PM
samj1912

Yeah

2018-06-04 15509, 2018

12:14 PM
samj1912

I know about that

2018-06-04 15513, 2018

12:14 PM
ephemer0l_ joined the channel

2018-06-04 15556, 2018

12:14 PM
samj1912

I would like to know how the existing search works

2018-06-04 15507, 2018

12:15 PM
samj1912

Does it return the docs or just the ids

2018-06-04 15525, 2018

12:15 PM
samj1912

I am guessing the docs since we just plugged in solr instead of search

2018-06-04 15503, 2018

12:16 PM
samj1912

IMHO, perl layer should be the layer of abstraction and solr should just return mbids given a search query

2018-06-04 15516, 2018

12:16 PM
samj1912

And scores obviously

2018-06-04 15534, 2018

12:16 PM
zas

i tend to agree

2018-06-04 15500, 2018

12:17 PM
samj1912

I can easily make the changes for this in solr in a day

2018-06-04 15501, 2018

12:17 PM
zas

atm, SIR is creating the xml data to store in _store field ?

2018-06-04 15509, 2018

12:17 PM
samj1912

Not sure how to handle it in mbs

2018-06-04 15517, 2018

12:17 PM
samj1912

zas, yes

2018-06-04 15530, 2018

12:17 PM
samj1912

Which is what takes it so long to index things

2018-06-04 15534, 2018

12:17 PM
zas

so basically it could generate a hash, and store the data in a file

2018-06-04 15544, 2018

12:17 PM
samj1912

zas, sure

2018-06-04 15553, 2018

12:17 PM
samj1912

But keep in mind the data is updated regularly

2018-06-04 15500, 2018

12:18 PM
zas

and we could have this hash in the _store field, and read from the file at response time

2018-06-04 15508, 2018

12:18 PM
samj1912

Sure

2018-06-04 15521, 2018

12:18 PM
samj1912

Not sure what to do about the json stuff though

2018-06-04 15558, 2018

12:18 PM
samj1912

SIR doesn't create the json responses

2018-06-04 15500, 2018

12:19 PM
zas

can you get the size of all _store fields compared to the size of search indexes ?

2018-06-04 15508, 2018

12:19 PM
samj1912

Hmm

2018-06-04 15528, 2018

12:19 PM
zas

SIR creates only xml ? and then it's converted to json ?

2018-06-04 15535, 2018

12:19 PM
samj1912

Yup

2018-06-04 15558, 2018

12:19 PM
samj1912

But it's not a lot of overhead vs xml

2018-06-04 15514, 2018

12:20 PM
samj1912

Since it stores xml as a string

2018-06-04 15526, 2018

12:20 PM
samj1912

It is then converted to a Java object

2018-06-04 15536, 2018

12:20 PM
samj1912

And then respective xml/json responses

2018-06-04 15537, 2018

12:20 PM
zas

but we could do the reverse, especially since we'll prolly deprecate xml output at some point

2018-06-04 15555, 2018

12:20 PM
zas

and json is usually more compact

2018-06-04 15507, 2018

12:21 PM
samj1912

zas, not sure if we can

2018-06-04 15519, 2018

12:21 PM
samj1912

Mmd-schema basically defines the xml schema

2018-06-04 15536, 2018

12:21 PM
samj1912

And that is what is used to create all the bindings

2018-06-04 15546, 2018

12:21 PM
samj1912

Converting to json requires a lot of adapters

2018-06-04 15524, 2018

12:22 PM
zas

ok, this stuff isn't really crystal clear to me yet

2018-06-04 15527, 2018

12:22 PM
samj1912

Even in WS I imagine a similar process is used, which is why the json output is sometimes inconsistent with the xml one

2018-06-04 15500, 2018

12:23 PM
samj1912

bitmap should be able to explain the mbs side better

2018-06-04 15517, 2018

12:23 PM
samj1912

yvanzo: ^ do you have any idea about this?

2018-06-04 15551, 2018

12:25 PM
Darkloke has quit

2018-06-04 15529, 2018

12:31 PM
samj1912

zas: an example of what the _store field looks like - http://195.201.149.141:8983/solr/release/advanced…

2018-06-04 15527, 2018

12:32 PM
samj1912

also I am not sure if siege is benchmarking things correctly

2018-06-04 15530, 2018

12:32 PM
yvanzo

I’m not sure the JSON serialization is advanced enough on MBS side to replace XML serialization.

2018-06-04 15500, 2018

12:33 PM
samj1912

https://www.irccloud.com/pastebin/JIRkJ0YH/

2018-06-04 15509, 2018

12:33 PM
samj1912

because zas on ab this is what I get ^

2018-06-04 15553, 2018

12:33 PM
samj1912

and that's 1 node

2018-06-04 15531, 2018

12:34 PM
zas

failed requests vs complete requests

2018-06-04 15540, 2018

12:34 PM
zas

i'm off to the supermarket, bbl

2018-06-04 15501, 2018

12:35 PM
samj1912

not even sure why it is showing them as failed reqs

2018-06-04 15504, 2018

12:35 PM
samj1912

hmm

2018-06-04 15507, 2018

12:35 PM
samj1912

let me try something

2018-06-04 15550, 2018

12:35 PM
yvanzo

About inconsistency: MBS-9734

2018-06-04 15550, 2018

12:35 PM
BrainzBot

MBS-9734: inconsistency between the JSON search API and the lookup/browse one in ws/2/ https://tickets.metabrainz.org/browse/MBS-9734

2018-06-04 15515, 2018

12:41 PM
Slurpee joined the channel

2018-06-04 15515, 2018

12:41 PM
Slurpee has quit

2018-06-04 15515, 2018

12:41 PM
Slurpee joined the channel

2018-06-04 15525, 2018

12:43 PM
Sophist-UK has quit

2018-06-04 15506, 2018

12:44 PM
Sophist-UK joined the channel

2018-06-04 15542, 2018

12:52 PM
Sophist_UK joined the channel

2018-06-04 15527, 2018

12:56 PM
Sophist-UK has quit

2018-06-04 15526, 2018

13:01 PM
Sophist_UK has quit

2018-06-04 15513, 2018

13:06 PM
Sophist-UK joined the channel

2018-06-04 15511, 2018

13:13 PM
Sophist_UK joined the channel

2018-06-04 15545, 2018

13:16 PM
Sophist-UK has quit

2018-06-04 15516, 2018

13:30 PM
Sophist-UK joined the channel

2018-06-04 15527, 2018

13:31 PM
Sophist_UK has quit

2018-06-04 15528, 2018

13:33 PM
kartikeyaSh

https://news.microsoft.com/2018/06/04/microsoft-t…

2018-06-04 15510, 2018

13:35 PM
samj1912

Nat Friedman huh

2018-06-04 15517, 2018

13:35 PM
samj1912

That was expected :p

2018-06-04 15531, 2018

13:36 PM
samj1912

GitHub will retain its developer-first ethos and will operate independently to provide an open platform for all developers in all industries.

2018-06-04 15508, 2018

13:41 PM
kartikeyaSh

"independently"🤨

2018-06-04 15506, 2018

13:43 PM
kartikeyaSh

Microsoft is the most active organization on GitHub in the world. Didn't knew that https://blog.github.com/2018-06-04-github-microso…

2018-06-04 15542, 2018

13:43 PM
samj1912

Let's see how that goes :p

2018-06-04 15553, 2018

13:43 PM
samj1912

kartikeyaSh: yup