#metabrainz

/

      • kori has quit
      • 2019-08-02 21402, 2019

      • ayerhart joined the channel
      • 2019-08-02 21459, 2019

      • ayerhart has quit
      • 2019-08-02 21414, 2019

      • d4rkie joined the channel
      • 2019-08-02 21420, 2019

      • D4RK-PH0ENiX has quit
      • 2019-08-02 21435, 2019

      • ayerhart joined the channel
      • 2019-08-02 21417, 2019

      • kori joined the channel
      • 2019-08-02 21417, 2019

      • kori has quit
      • 2019-08-02 21417, 2019

      • kori joined the channel
      • 2019-08-02 21402, 2019

      • ayerhart has quit
      • 2019-08-02 21401, 2019

      • d4rkie has quit
      • 2019-08-02 21440, 2019

      • D4RK-PH0ENiX joined the channel
      • 2019-08-02 21439, 2019

      • D4RK-PH0ENiX has quit
      • 2019-08-02 21409, 2019

      • D4RK-PH0ENiX joined the channel
      • 2019-08-02 21443, 2019

      • pristine__
        ruaok: hey. Can you review #40, #41 and #42?
      • 2019-08-02 21421, 2019

      • iliekcomputers
        alastairp: dump is done, if you add me to bono, i'll copy it over.
      • 2019-08-02 21437, 2019

      • iliekcomputers
      • 2019-08-02 21434, 2019

      • d4rkie joined the channel
      • 2019-08-02 21447, 2019

      • D4RK-PH0ENiX has quit
      • 2019-08-02 21432, 2019

      • d4rkie has quit
      • 2019-08-02 21416, 2019

      • D4RK-PH0ENiX joined the channel
      • 2019-08-02 21430, 2019

      • reosarevok
        yvanzo, bitmap, zas: I tried to point my local server at solr-cloud for testing https://tickets.metabrainz.org/browse/MBS-10258 but I'm getting a 403. What am I missing? :)
      • 2019-08-02 21431, 2019

      • BrainzBot
        MBS-10258: CD stub search fails with error on prod & beta
      • 2019-08-02 21405, 2019

      • reosarevok
        (I copied the config lines from serge)
      • 2019-08-02 21433, 2019

      • antlarr2 is now known as antlarr
      • 2019-08-02 21420, 2019

      • Cyna
        reosarevok: what is `comma_only_list`
      • 2019-08-02 21437, 2019

      • Cyna
        I dont see a macro for it
      • 2019-08-02 21420, 2019

      • Cyna
        nor a utility
      • 2019-08-02 21404, 2019

      • yvanzo
        reosarevok: The new server is not publicly accessible on purpose.
      • 2019-08-02 21439, 2019

      • yvanzo
        reosarevok: However, you might be able to access it from your local development server through ssh port forwarding.
      • 2019-08-02 21404, 2019

      • reosarevok
        I know, thought there would be some key/token I can use
      • 2019-08-02 21444, 2019

      • reosarevok
        Cyna: it's already in JS as CommaOnlyList or whatnot, search for uses
      • 2019-08-02 21451, 2019

      • Cyna
        I'll have a look, thanks :)
      • 2019-08-02 21401, 2019

      • yvanzo
        git grep 'sub *comma_only_list'
      • 2019-08-02 21435, 2019

      • Cyna
        I saw the `sub comma only list`
      • 2019-08-02 21440, 2019

      • Cyna
        What do they mean ?
      • 2019-08-02 21456, 2019

      • yvanzo
        it is a definition for a subroutine in Perl
      • 2019-08-02 21414, 2019

      • Cyna
        Any idea how do I get it in React
      • 2019-08-02 21432, 2019

      • Cyna
        reosarevok: Couldnt find CommaOnlyList using git grep
      • 2019-08-02 21436, 2019

      • yvanzo
        it is defined in module Translation, and exported for use in templates in View::Default
      • 2019-08-02 21419, 2019

      • Cyna
        didnt get it 😅
      • 2019-08-02 21433, 2019

      • yvanzo
        (that was about comma_only_list)
      • 2019-08-02 21459, 2019

      • Cyna
        ohh yea
      • 2019-08-02 21405, 2019

      • Cyna
        I see that in the grep
      • 2019-08-02 21420, 2019

      • yvanzo
        Cyna: there should be many occurrences of CommaOnlyList using git grep
      • 2019-08-02 21423, 2019

      • reosarevok
        Maybe it's a function so lowercase comma
      • 2019-08-02 21427, 2019

      • reosarevok
        I don't remember
      • 2019-08-02 21429, 2019

      • iliekcomputers
        ruaok: I like how airports are one of the few places that don't judge you if you get drunk at 10:30 in the morning :)
      • 2019-08-02 21444, 2019

      • Cyna
        ohh sorry
      • 2019-08-02 21448, 2019

      • Cyna
        Its a lower case comma
      • 2019-08-02 21453, 2019

      • yvanzo
        Cyna: git grep 'const *CommaOnlyList'
      • 2019-08-02 21436, 2019

      • reosarevok
        yvanzo: so there's no key or way to use the solr server, is it hardcoded on the solr side?
      • 2019-08-02 21438, 2019

      • yvanzo
        Oops, git grep -i 'const *CommaOnlyList'
      • 2019-08-02 21431, 2019

      • yvanzo
        reosarevok: there is just no port open to the outside world from the solr server.
      • 2019-08-02 21450, 2019

      • yvanzo
        (if I understood it correctly)
      • 2019-08-02 21452, 2019

      • Cyna
        ```[%- FOR attribute=type.value %]
      • 2019-08-02 21452, 2019

      • Cyna
        <li>[% attribute.l_value %]</li>
      • 2019-08-02 21452, 2019

      • Cyna
        [%- END %]```
      • 2019-08-02 21400, 2019

      • reosarevok
        Oh no
      • 2019-08-02 21413, 2019

      • Cyna
        Can this be ported to type.value.l_value ?
      • 2019-08-02 21400, 2019

      • yvanzo
        Cyna: what reosarevok just answered above ;)
      • 2019-08-02 21415, 2019

      • reosarevok
        Cyna: l_attributes(value) is probably what's needed there, but maybe it's slightly different
      • 2019-08-02 21442, 2019

      • reosarevok
        We don't usually pass l_stuff to JS (l_ there is a translated value)
      • 2019-08-02 21456, 2019

      • reosarevok
        But yvanzo has touched attributes more than I have, so he might have something else to say
      • 2019-08-02 21403, 2019

      • reosarevok
        yvanzo: hmm, I see
      • 2019-08-02 21426, 2019

      • reosarevok
        But test.mb is on the same place and could be changed to access solr, right?
      • 2019-08-02 21433, 2019

      • reosarevok
        So I could test the code there?
      • 2019-08-02 21440, 2019

      • yvanzo
        reosarevok: that might be both/either a security issue and/or networking performance issue.
      • 2019-08-02 21445, 2019

      • reosarevok
        (I know it currently also uses lucene)
      • 2019-08-02 21412, 2019

      • yvanzo
        reosarevok: probably but it would then access search indexes unrelated to its database, though that would probably be ok for your tests.
      • 2019-08-02 21439, 2019

      • reosarevok
        Oh, fair
      • 2019-08-02 21450, 2019

      • reosarevok
        So change its dbdefs, test, change back
      • 2019-08-02 21400, 2019

      • yvanzo
        yup, you’ll probably have to restart the container each time after changing DBDefs
      • 2019-08-02 21427, 2019

      • yvanzo
        (or not)
      • 2019-08-02 21443, 2019

      • ruaok
        iliekcomputers: where are you? On CEST?
      • 2019-08-02 21451, 2019

      • nav2002_ joined the channel
      • 2019-08-02 21401, 2019

      • Gazooo has quit
      • 2019-08-02 21453, 2019

      • Gazooo joined the channel
      • 2019-08-02 21405, 2019

      • ruaok
        pristine__: reviewing it on top of my list of things to do.
      • 2019-08-02 21426, 2019

      • ruaok
        I escaped the time consuming and fattterning family interactions in the UK.
      • 2019-08-02 21453, 2019

      • ruaok
        now on to a chosen family gathering in the netherlands.
      • 2019-08-02 21422, 2019

      • ruaok
        yvanzo: ping me when you're around, I'd like to hear what you need python help with.
      • 2019-08-02 21415, 2019

      • zas
        moooin
      • 2019-08-02 21414, 2019

      • alastairp
        hello
      • 2019-08-02 21421, 2019

      • yvanzo
        ruaok: I may just have fixed the bug, thanks to the hints alastairp gave me yesterday.
      • 2019-08-02 21427, 2019

      • alastairp
        woo!
      • 2019-08-02 21443, 2019

      • yvanzo
        At least, last reindex didn’t log any error.
      • 2019-08-02 21457, 2019

      • alastairp
        there's still the open question of why there are entities in the database with control characters in their name
      • 2019-08-02 21405, 2019

      • yvanzo
        Oops, sorry, I did not :(
      • 2019-08-02 21433, 2019

      • yvanzo
        Despite removing control chars in query_result_to_dict, it still throws the same error.
      • 2019-08-02 21415, 2019

      • yvanzo
        alastairp: I could not confirm there is such entity.
      • 2019-08-02 21455, 2019

      • reosarevok
        bitmap, yvanzo: I'm at a festival until Sunday, away from my laptop until Monday
      • 2019-08-02 21408, 2019

      • reosarevok
        Will be online for questions / chat if needed, but no access to my code :
      • 2019-08-02 21409, 2019

      • reosarevok
        * :)
      • 2019-08-02 21414, 2019

      • yvanzo
        ruaok: before anything else, I probably need a way to better trace the error being thrown.
      • 2019-08-02 21403, 2019

      • ruaok
        pristine__: #40 reviewed, we should chat a bit.
      • 2019-08-02 21416, 2019

      • alastairp
        yvanzo: do you know the entity?
      • 2019-08-02 21434, 2019

      • ruaok
        yvanzo: how can I help?
      • 2019-08-02 21431, 2019

      • yvanzo
        alastairp: I have not been able to reproduce the error with a limited dataset so far.
      • 2019-08-02 21443, 2019

      • ruaok
        would a bigger VM help?
      • 2019-08-02 21414, 2019

      • yvanzo
        not for now, just trying to isolate the bug first.
      • 2019-08-02 21447, 2019

      • ruaok
        can you give me some background on it? what happens, what dataset? links to code involved?
      • 2019-08-02 21425, 2019

      • yvanzo
      • 2019-08-02 21404, 2019

      • yvanzo
        it only occurs while reindexing annotation, recording, release, and url
      • 2019-08-02 21440, 2019

      • alastairp
        it'd be a bit of work, but my recommendation is to implement an import method that doesn't use multiprocessing, so that you can get the full stacktrace
      • 2019-08-02 21440, 2019

      • yvanzo
        I tried to bisect by deleting some database entries that were correctly reindexed => the error was not thrown anymore
      • 2019-08-02 21440, 2019

      • ruaok
        yvanzo: have you added a detailed print statement that prints out the contents of the offending XML in sir/indexing.py ?
      • 2019-08-02 21407, 2019

      • ruaok
        and then running it on whatever data set that exhibits the problem?
      • 2019-08-02 21408, 2019

      • alastairp
        ruaok: my understanding of the error message is that it's thrown by lxml, which means it's while it's trying to construct the xml
      • 2019-08-02 21444, 2019

      • alastairp
        so yeah, the same but actually print the objects/ids that it's trying to serialise
      • 2019-08-02 21454, 2019

      • ruaok
        yep, that.
      • 2019-08-02 21412, 2019

      • ruaok
        might be ton of output, but redirect to a file and look a that tail after failure, no?
      • 2019-08-02 21431, 2019

      • ruaok
        it sure seems that there is an encoding like issue that isn't being handled by the serializer.
      • 2019-08-02 21408, 2019

      • alastairp
        which from my understanding is weird, because the solr library that's being used strips control characters from values before putting them into the xml
      • 2019-08-02 21417, 2019

      • ruaok
        the curious thing is, why does it happen only on reindex and not on the original data set?
      • 2019-08-02 21441, 2019

      • ruaok
        or is it that new data comes in that is faulty and it follows a different code path?
      • 2019-08-02 21453, 2019

      • alastairp
      • 2019-08-02 21400, 2019

      • alastairp
        depending on the value of `live`
      • 2019-08-02 21449, 2019

      • alastairp
        but that only seems to affect the WHERE clause
      • 2019-08-02 21454, 2019

      • yvanzo
        I added some details to logs to try to find out which database entries where causing this, but it did not help so far
      • 2019-08-02 21406, 2019

      • yvanzo
        alastairp: 'live' is 'false' during reindex
      • 2019-08-02 21414, 2019

      • alastairp
        right
      • 2019-08-02 21438, 2019

      • alastairp
        how difficult is sir/solr to set up?
      • 2019-08-02 21459, 2019

      • ruaok
        is there some processing that happens to the data that comes in when "live" but does happen when the data is being fetched from the DB?
      • 2019-08-02 21432, 2019

      • yvanzo
      • 2019-08-02 21451, 2019

      • yvanzo
        but it takes some resources
      • 2019-08-02 21445, 2019

      • yvanzo
        ruaok: I don't know if this error would happen during live reindex because these entries may just not have been through live reindex yet.
      • 2019-08-02 21404, 2019

      • ruaok
        what resources do you need?
      • 2019-08-02 21459, 2019

      • yvanzo
        that was for alastairp if he wanted to deploy it
      • 2019-08-02 21410, 2019

      • alastairp
        yvanzo: indexer doesn't build..
      • 2019-08-02 21419, 2019

      • alastairp
        indexer: build: indexer-dockerfile
      • 2019-08-02 21432, 2019

      • alastairp
        but no such folder in the checkout, just sir-dockerfile and solr-dockerfile
      • 2019-08-02 21448, 2019

      • Gazooo has quit
      • 2019-08-02 21449, 2019

      • alastairp
        but the build tries to use airdock/oracle-jdk:jdk-8u74, says that it doesn't exist
      • 2019-08-02 21456, 2019

      • yvanzo
        ruaok: is it possible to give alastairp access to the current test vm?
      • 2019-08-02 21409, 2019

      • ruaok
        yes.
      • 2019-08-02 21410, 2019

      • Gazooo joined the channel
      • 2019-08-02 21415, 2019

      • ruaok
        hang on.
      • 2019-08-02 21423, 2019

      • ruaok
        which key alastairp? one on github?
      • 2019-08-02 21418, 2019

      • alastairp
      • 2019-08-02 21446, 2019

      • yvanzo
        alastairp: what is commit HEAD id on your musicbrainz-docker clone?
      • 2019-08-02 21406, 2019

      • alastairp
        yvanzo: oh, sorry. of course
      • 2019-08-02 21409, 2019

      • alastairp
        that'd be master, not your branch
      • 2019-08-02 21405, 2019

      • ruaok
        alastairp: ssh alastair@104.197.183.152
      • 2019-08-02 21439, 2019

      • alastairp
        yeah, I'm in
      • 2019-08-02 21442, 2019

      • ruaok
        now with sudo love
      • 2019-08-02 21401, 2019

      • yvanzo
        sudo su - yvanzo
      • 2019-08-02 21426, 2019

      • yvanzo
        see *annotations*log
      • 2019-08-02 21432, 2019

      • alastairp
        yeah, there's not a lot of debugging there
      • 2019-08-02 21445, 2019

      • alastairp
        how do you run the indexer?