#metabrainz

/

      • samj1912
        Anyway. zas, you around?
      • 2018-06-04 15506, 2018

      • zas
        reosarevok, samj1912: i guess you are too young to know about M$ relationship to open source, it was a time they were trying everything to kill it. And they failed, but it doesn't mean they ever stopped to try. M$ being open source friendly is a joke that only young devs can buy. M$ changed, bla bla bla. This company tried everything to kill linux, you don't remember, but i do, and i'll not forget, whatever they say now.
      • 2018-06-04 15511, 2018

      • zas
      • 2018-06-04 15550, 2018

      • zas
        His position now, after they failed to kill it: https://www.zdnet.com/article/ballmer-i-may-have-…
      • 2018-06-04 15553, 2018

      • samj1912
        Well that was Steve Balmer :p
      • 2018-06-04 15556, 2018

      • reosarevok
        I do remember. That's before they realized they can make more money this way :p
      • 2018-06-04 15511, 2018

      • zas
        samj1912: yes, CEO of M$.
      • 2018-06-04 15528, 2018

      • samj1912
        Yes, and he is no longer the CEO
      • 2018-06-04 15552, 2018

      • D4RK-PH0ENiX joined the channel
      • 2018-06-04 15527, 2018

      • reosarevok
        I mean, I did grow up with M$ and all, but right now I don't feel they're worse than other big company in the IT world honestly. And if they do change the stuff, then we already have a Bitbucket repo, don't we?
      • 2018-06-04 15539, 2018

      • reosarevok
        Anyway, we can talk that during the meeting :)
      • 2018-06-04 15542, 2018

      • samj1912
        Yup
      • 2018-06-04 15554, 2018

      • zas
        I guess i'll not convince you to stay away from those. But about open source orgs "panicking", you're wrong, they don't panick, they just know what will happen.
      • 2018-06-04 15516, 2018

      • samj1912
        zas, so I tried to testing all the collections
      • 2018-06-04 15530, 2018

      • samj1912
        Release and recording the heartbreakers :p
      • 2018-06-04 15537, 2018

      • samj1912
        The rest of them scale fine
      • 2018-06-04 15554, 2018

      • samj1912
        But those are the only two that have a lot of hits and currently can't keep up
      • 2018-06-04 15505, 2018

      • zas
        Did you tune caching ?
      • 2018-06-04 15520, 2018

      • samj1912
        I am not really sure how to
      • 2018-06-04 15528, 2018

      • samj1912
        I added one sure
      • 2018-06-04 15534, 2018

      • samj1912
        Didn't help much
      • 2018-06-04 15500, 2018

      • samj1912
        I even conducted tests if our mb-solr response writer was the culprit
      • 2018-06-04 15519, 2018

      • samj1912
        But at max it has a 10% overhead across all cores
      • 2018-06-04 15525, 2018

      • samj1912
        Which is to be excepted
      • 2018-06-04 15532, 2018

      • Lotheric has quit
      • 2018-06-04 15540, 2018

      • samj1912
        Solr should perform much better and it can
      • 2018-06-04 15551, 2018

      • samj1912
        But the release and recording responses are just big
      • 2018-06-04 15559, 2018

      • Lotheric joined the channel
      • 2018-06-04 15511, 2018

      • samj1912
        The document retrieval isn't a problem
      • 2018-06-04 15529, 2018

      • samj1912
        I tested it out with release groups, which is almost the same size index as release
      • 2018-06-04 15546, 2018

      • samj1912
        But release group performs atleast 5 6 times better
      • 2018-06-04 15547, 2018

      • D4RK-PH0ENiX has quit
      • 2018-06-04 15559, 2018

      • samj1912
        Release responses are just huge
      • 2018-06-04 15504, 2018

      • zas
        i guess it has to do with fields, and overall indexed data structure
      • 2018-06-04 15505, 2018

      • samj1912
        Ditto for recording
      • 2018-06-04 15527, 2018

      • samj1912
        Well zas, we use it sort of differently
      • 2018-06-04 15537, 2018

      • samj1912
        There are indexed fields for search
      • 2018-06-04 15552, 2018

      • samj1912
        But for document display each document has a _store field
      • 2018-06-04 15513, 2018

      • samj1912
        Which contains an xml string
      • 2018-06-04 15537, 2018

      • samj1912
        This is unmarshelled into a Java object
      • 2018-06-04 15529, 2018

      • samj1912
        And then marshaled into mbxml/mbjson responses
      • 2018-06-04 15555, 2018

      • zas
        hmmm, doesn't look great at first glance, are they alternatives to that ?
      • 2018-06-04 15514, 2018

      • samj1912
        I haven't thought about it
      • 2018-06-04 15546, 2018

      • samj1912
        But it will be tough to have something to keep compatible with our changing schema if we don't do it that way
      • 2018-06-04 15515, 2018

      • samj1912
        The bindings are basically generated from mmd-schema directly
      • 2018-06-04 15546, 2018

      • samj1912
        And solr's default xml/json out putter won't output it in the exact same way as WS2
      • 2018-06-04 15533, 2018

      • zas
        i'd like to have bitmap's thoughts about this stuff
      • 2018-06-04 15526, 2018

      • samj1912
        Cool
      • 2018-06-04 15535, 2018

      • samj1912
        We can discuss this after the meeting then
      • 2018-06-04 15541, 2018

      • samj1912
        Or before?
      • 2018-06-04 15542, 2018

      • zas
        what's in the _store field exactly ?
      • 2018-06-04 15504, 2018

      • samj1912
        zas, the exact xml response you get from WS2 for a document
      • 2018-06-04 15515, 2018

      • D4RK-PH0ENiX joined the channel
      • 2018-06-04 15557, 2018

      • zas
        what is filling it ?
      • 2018-06-04 15506, 2018

      • samj1912
        SIR
      • 2018-06-04 15516, 2018

      • zas
        and it is stored in SOLR?
      • 2018-06-04 15520, 2018

      • samj1912
        Yup
      • 2018-06-04 15531, 2018

      • samj1912
        Stored, not indexed
      • 2018-06-04 15532, 2018

      • zas
        hmmm, but it isn't used for searches, right ?
      • 2018-06-04 15546, 2018

      • samj1912
        Nope
      • 2018-06-04 15552, 2018

      • samj1912
        Just for display
      • 2018-06-04 15543, 2018

      • zas
        so it means if it wasn't stored in SOLR (but an id) everything will be much lighter ?
      • 2018-06-04 15552, 2018

      • samj1912
        Definitely
      • 2018-06-04 15502, 2018

      • samj1912
        That would be a breeze for solr to handle
      • 2018-06-04 15518, 2018

      • samj1912
        And sir.
      • 2018-06-04 15530, 2018

      • zas
        and what does return the final answer ?
      • 2018-06-04 15546, 2018

      • zas
        solrwriter ?
      • 2018-06-04 15556, 2018

      • samj1912
        Nope, solr
      • 2018-06-04 15507, 2018

      • samj1912
        But it passes the docs through solr writer
      • 2018-06-04 15526, 2018

      • samj1912
        As I said, solr writer is not the problem
      • 2018-06-04 15540, 2018

      • samj1912
        The mere amount of information in _store field is too much
      • 2018-06-04 15500, 2018

      • samj1912
        So I tested with the json out putter of solr
      • 2018-06-04 15524, 2018

      • samj1912
        Just outputting the document score and the _store field without converting it to its proper structure
      • 2018-06-04 15534, 2018

      • samj1912
        The overhead was 10% for solr writer
      • 2018-06-04 15549, 2018

      • samj1912
        Gets lesser as the reqs/s go up
      • 2018-06-04 15552, 2018

      • zas
        nope, but storing data in solr is not really recommended: https://wiki.apache.org/solr/SolrPerformanceProbl… says "Don't store all your fields, especially the really big ones. Instead, have your application retrieve detail data from the original data source, not Solr. " (but i'm not sure it applies here)
      • 2018-06-04 15505, 2018

      • samj1912
        Yeah
      • 2018-06-04 15509, 2018

      • samj1912
        I know about that
      • 2018-06-04 15513, 2018

      • ephemer0l_ joined the channel
      • 2018-06-04 15556, 2018

      • samj1912
        I would like to know how the existing search works
      • 2018-06-04 15507, 2018

      • samj1912
        Does it return the docs or just the ids
      • 2018-06-04 15525, 2018

      • samj1912
        I am guessing the docs since we just plugged in solr instead of search
      • 2018-06-04 15503, 2018

      • samj1912
        IMHO, perl layer should be the layer of abstraction and solr should just return mbids given a search query
      • 2018-06-04 15516, 2018

      • samj1912
        And scores obviously
      • 2018-06-04 15534, 2018

      • zas
        i tend to agree
      • 2018-06-04 15500, 2018

      • samj1912
        I can easily make the changes for this in solr in a day
      • 2018-06-04 15501, 2018

      • zas
        atm, SIR is creating the xml data to store in _store field ?
      • 2018-06-04 15509, 2018

      • samj1912
        Not sure how to handle it in mbs
      • 2018-06-04 15517, 2018

      • samj1912
        zas, yes
      • 2018-06-04 15530, 2018

      • samj1912
        Which is what takes it so long to index things
      • 2018-06-04 15534, 2018

      • zas
        so basically it could generate a hash, and store the data in a file
      • 2018-06-04 15544, 2018

      • samj1912
        zas, sure
      • 2018-06-04 15553, 2018

      • samj1912
        But keep in mind the data is updated regularly
      • 2018-06-04 15500, 2018

      • zas
        and we could have this hash in the _store field, and read from the file at response time
      • 2018-06-04 15508, 2018

      • samj1912
        Sure
      • 2018-06-04 15521, 2018

      • samj1912
        Not sure what to do about the json stuff though
      • 2018-06-04 15558, 2018

      • samj1912
        SIR doesn't create the json responses
      • 2018-06-04 15500, 2018

      • zas
        can you get the size of all _store fields compared to the size of search indexes ?
      • 2018-06-04 15508, 2018

      • samj1912
        Hmm
      • 2018-06-04 15528, 2018

      • zas
        SIR creates only xml ? and then it's converted to json ?
      • 2018-06-04 15535, 2018

      • samj1912
        Yup
      • 2018-06-04 15558, 2018

      • samj1912
        But it's not a lot of overhead vs xml
      • 2018-06-04 15514, 2018

      • samj1912
        Since it stores xml as a string
      • 2018-06-04 15526, 2018

      • samj1912
        It is then converted to a Java object
      • 2018-06-04 15536, 2018

      • samj1912
        And then respective xml/json responses
      • 2018-06-04 15537, 2018

      • zas
        but we could do the reverse, especially since we'll prolly deprecate xml output at some point
      • 2018-06-04 15555, 2018

      • zas
        and json is usually more compact
      • 2018-06-04 15507, 2018

      • samj1912
        zas, not sure if we can
      • 2018-06-04 15519, 2018

      • samj1912
        Mmd-schema basically defines the xml schema
      • 2018-06-04 15536, 2018

      • samj1912
        And that is what is used to create all the bindings
      • 2018-06-04 15546, 2018

      • samj1912
        Converting to json requires a lot of adapters
      • 2018-06-04 15524, 2018

      • zas
        ok, this stuff isn't really crystal clear to me yet
      • 2018-06-04 15527, 2018

      • samj1912
        Even in WS I imagine a similar process is used, which is why the json output is sometimes inconsistent with the xml one
      • 2018-06-04 15500, 2018

      • samj1912
        bitmap should be able to explain the mbs side better
      • 2018-06-04 15517, 2018

      • samj1912
        yvanzo: ^ do you have any idea about this?
      • 2018-06-04 15551, 2018

      • Darkloke has quit
      • 2018-06-04 15529, 2018

      • samj1912
        zas: an example of what the _store field looks like - http://195.201.149.141:8983/solr/release/advanced…
      • 2018-06-04 15527, 2018

      • samj1912
        also I am not sure if siege is benchmarking things correctly
      • 2018-06-04 15530, 2018

      • yvanzo
        I’m not sure the JSON serialization is advanced enough on MBS side to replace XML serialization.
      • 2018-06-04 15500, 2018

      • samj1912
      • 2018-06-04 15509, 2018

      • samj1912
        because zas on ab this is what I get ^
      • 2018-06-04 15553, 2018

      • samj1912
        and that's 1 node
      • 2018-06-04 15531, 2018

      • zas
        failed requests vs complete requests
      • 2018-06-04 15540, 2018

      • zas
        i'm off to the supermarket, bbl
      • 2018-06-04 15501, 2018

      • samj1912
        not even sure why it is showing them as failed reqs
      • 2018-06-04 15504, 2018

      • samj1912
        hmm
      • 2018-06-04 15507, 2018

      • samj1912
        let me try something
      • 2018-06-04 15550, 2018

      • yvanzo
        About inconsistency: MBS-9734
      • 2018-06-04 15550, 2018

      • BrainzBot
        MBS-9734: inconsistency between the JSON search API and the lookup/browse one in ws/2/ https://tickets.metabrainz.org/browse/MBS-9734
      • 2018-06-04 15515, 2018

      • Slurpee joined the channel
      • 2018-06-04 15515, 2018

      • Slurpee has quit
      • 2018-06-04 15515, 2018

      • Slurpee joined the channel
      • 2018-06-04 15525, 2018

      • Sophist-UK has quit
      • 2018-06-04 15506, 2018

      • Sophist-UK joined the channel
      • 2018-06-04 15542, 2018

      • Sophist_UK joined the channel
      • 2018-06-04 15527, 2018

      • Sophist-UK has quit
      • 2018-06-04 15526, 2018

      • Sophist_UK has quit
      • 2018-06-04 15513, 2018

      • Sophist-UK joined the channel
      • 2018-06-04 15511, 2018

      • Sophist_UK joined the channel
      • 2018-06-04 15545, 2018

      • Sophist-UK has quit
      • 2018-06-04 15516, 2018

      • Sophist-UK joined the channel
      • 2018-06-04 15527, 2018

      • Sophist_UK has quit
      • 2018-06-04 15528, 2018

      • kartikeyaSh
      • 2018-06-04 15510, 2018

      • samj1912
        Nat Friedman huh
      • 2018-06-04 15517, 2018

      • samj1912
        That was expected :p
      • 2018-06-04 15531, 2018

      • samj1912
        GitHub will retain its developer-first ethos and will operate independently to provide an open platform for all developers in all industries.
      • 2018-06-04 15508, 2018

      • kartikeyaSh
        "independently"🤨
      • 2018-06-04 15506, 2018

      • kartikeyaSh
        Microsoft is the most active organization on GitHub in the world. Didn't knew that https://blog.github.com/2018-06-04-github-microso…
      • 2018-06-04 15542, 2018

      • samj1912
        Let's see how that goes :p
      • 2018-06-04 15553, 2018

      • samj1912
        kartikeyaSh: yup