#metabrainz

/

      • ruaok
        this avoids incremental round off errors.
      • 2019-08-02 21443, 2019

      • ruaok
        then converts that to year, month. save that.
      • 2019-08-02 21449, 2019

      • ruaok
        then at the end, make the list unique.
      • 2019-08-02 21417, 2019

      • ruaok
        not the fastest but for what we need it would actually do the job.
      • 2019-08-02 21438, 2019

      • ruaok
        to improve on this version:
      • 2019-08-02 21401, 2019

      • ruaok
        ah, let me just code that version
      • 2019-08-02 21454, 2019

      • ruaok
        hmm, actually I need food to do that. Let me go and find food and come back in a bit.
      • 2019-08-02 21458, 2019

      • pristine__
        I am not sure if I am okay with it but yeah, can be implemented.
      • 2019-08-02 21410, 2019

      • ruaok
        in the meantime, think about how to do this for months.
      • 2019-08-02 21437, 2019

      • ruaok
        given any day, go the first of the month. add that month to the fetch parquet list.
      • 2019-08-02 21438, 2019

      • alastairp
        yvanzo: can I tell the indexer to only do a specific bounds?
      • 2019-08-02 21413, 2019

      • ruaok
        then increment 32 days. change date to the first of the month.
      • 2019-08-02 21426, 2019

      • ruaok
        that would be safe.
      • 2019-08-02 21429, 2019

      • ruaok
        ok, back soon.
      • 2019-08-02 21428, 2019

      • yvanzo
        alastairp: no :(
      • 2019-08-02 21455, 2019

      • yvanzo
        alastairp: you can reindex specific entries in live mode only, by sending reindex messages through rabbitmq
      • 2019-08-02 21414, 2019

      • yvanzo
        alastairp: we can probably filter specific entries by modifying 'filter_valid_annotations' in schema/queryext
      • 2019-08-02 21427, 2019

      • ruaok returns
      • 2019-08-02 21449, 2019

      • ruaok
        pristine__: still want me to rewrite that function to use months?
      • 2019-08-02 21400, 2019

      • pristine__
        no. I will do :)
      • 2019-08-02 21423, 2019

      • pristine__
        but can you please explain the overhead in my approach? it is bugging me
      • 2019-08-02 21441, 2019

      • pristine__
        (month = month +1)
      • 2019-08-02 21403, 2019

      • pristine__
        ( want to know so that i am careful in future)
      • 2019-08-02 21457, 2019

      • ruaok
        the culprit, at least for working with floating point numbers is cumulative rounding errors.
      • 2019-08-02 21440, 2019

      • ruaok
      • 2019-08-02 21426, 2019

      • ruaok
        in general it is preferred that a known starting point (begin_date) is used as a base and then a multiplied increment is added to the offset.
      • 2019-08-02 21432, 2019

      • alastairp
        yvanzo:
      • 2019-08-02 21437, 2019

      • alastairp
        exception stacktrace is
      • 2019-08-02 21440, 2019

      • ruaok
        rather than repeatedely adding the increment.
      • 2019-08-02 21441, 2019

      • alastairp
      • 2019-08-02 21450, 2019

      • alastairp
        annotation.id is 249425
      • 2019-08-02 21457, 2019

      • alastairp
        text is `Live concert recorded at BBC-Radio 1\x02\x02 on November 21, 2002. Plus rare B-sides and acoustic tracks.\r`
      • 2019-08-02 21404, 2019

      • alastairp
        (contains \x02)
      • 2019-08-02 21410, 2019

      • ruaok
        !m alastairp
      • 2019-08-02 21410, 2019

      • BrainzBot
        You're doing good work, alastairp!
      • 2019-08-02 21426, 2019

      • alastairp
        phew
      • 2019-08-02 21430, 2019

      • alastairp
        wasn't where we thought it was
      • 2019-08-02 21412, 2019

      • ruaok
        was this data that was just added to the DB/index, alastairp ?
      • 2019-08-02 21436, 2019

      • alastairp
        id | created
      • 2019-08-02 21436, 2019

      • alastairp
        --------+-------------------------------
      • 2019-08-02 21436, 2019

      • alastairp
        249425 | 2007-12-14 18:00:59.638629+01
      • 2019-08-02 21442, 2019

      • pristine__
        okay.
      • 2019-08-02 21451, 2019

      • pristine__
        thanks :)
      • 2019-08-02 21452, 2019

      • ruaok
        pristine__: while in our case we're not dealing with floating point numbers, we're dealing with tricky corner cases.
      • 2019-08-02 21425, 2019

      • ruaok
        in this case the increment for month ( +=1 ) is easy, but a month has different days. that leads to nasty edge conditions.
      • 2019-08-02 21455, 2019

      • ruaok
        if instead you go to day 1 of the month and then add 32 days and go back to day 1, you are guaranteed to get the right next first day of the month.
      • 2019-08-02 21409, 2019

      • ruaok
        pristine__: does that makes sense?
      • 2019-08-02 21416, 2019

      • alastairp
        there's something funky going on with the multiprocessing queue, my feeling is that the exception causes a sentinel to be added to the queue which either causes the solr writer to stop processing items from the queue, or it crashes
      • 2019-08-02 21436, 2019

      • alastairp
        which results in it not adding anything to solr after it gets this exception
      • 2019-08-02 21440, 2019

      • pristine__
        what if I go to day 1 of the month and then add 7 ?
      • 2019-08-02 21459, 2019

      • pristine__
        I am still in the same month, no ?
      • 2019-08-02 21430, 2019

      • pristine__
        (lemme know when it gets annoying, I hope I am not asking stupid questions :( )
      • 2019-08-02 21428, 2019

      • ruaok
        correct.
      • 2019-08-02 21441, 2019

      • ruaok
        but if you go 32 days from day 1, you'll *always* be in the next month.
      • 2019-08-02 21446, 2019

      • pristine__
        You mean fixing 32?
      • 2019-08-02 21424, 2019

      • pristine__
        No matter what the RECOMMEDATION_GENERATION_WINDOW is, once I subtract it from current date, I always add 32 and move to first date of month and repeat the procedure?
      • 2019-08-02 21405, 2019

      • yvanzo
        alastairp: thanks, I was not looking at the right place, just found more annotations with control chars
      • 2019-08-02 21412, 2019

      • alastairp
        yvanzo: yeah, I had a hunch when I saw the value of the row with `{'_store': <already some xml}`, but it still took a bit to break out of the multiprocessing thread
      • 2019-08-02 21431, 2019

      • yvanzo
        I was looking at the range SELECT text FROM annotation WHERE text ~ '[\x00]' LIMIT 1;
      • 2019-08-02 21405, 2019

      • yvanzo
        249430 - 249440
      • 2019-08-02 21407, 2019

      • alastairp
        I saw that there's a fix module: https://github.com/metabrainz/sir/blob/82176ad5e7… perhaps you could add workarounds here
      • 2019-08-02 21446, 2019

      • alastairp
        or, you have to modify all of the convert functions: https://github.com/metabrainz/sir/blob/ae61bf35a6…
      • 2019-08-02 21416, 2019

      • alastairp
        so that any place that it takes user-provided information, we do the filtering (that's going to increase hugely cpu usage :( )
      • 2019-08-02 21430, 2019

      • alastairp
        or convince bitmap to do this filtering at the insertion point
      • 2019-08-02 21407, 2019

      • yvanzo
        We should definitely filter at the insertion point, but sir should also gracefully log and skip such entries.
      • 2019-08-02 21445, 2019

      • alastairp
        right, as I said above, I think there's some sentinel being set which is causing the solr workers to stop inserting items
      • 2019-08-02 21401, 2019

      • alastairp
        if you can work out why that's happening, it should be possible to skip these items
      • 2019-08-02 21402, 2019

      • ruaok
        pristine__: not sure what you mean. maybe you could write your date calculation routines as a gist and then we can run it on the command line?
      • 2019-08-02 21428, 2019

      • kori has quit
      • 2019-08-02 21423, 2019

      • alastairp
        yvanzo: I believe this is the cause of the exception resulting in the processing to stop: https://github.com/metabrainz/sir/blob/272309f4e0…
      • 2019-08-02 21440, 2019

      • alastairp
        I _think_ what is happening is that if a subprocess from pool.imap(indexer, ..) causes an exeption, it's caught in this exception handler. the next thing it does is puts STOP on the queue, which causes the solr indexer workers to shut down
      • 2019-08-02 21425, 2019

      • alastairp
        however, it seems that pool.imap continues running through all of the arguments, and keeps putting things on the queue
      • 2019-08-02 21407, 2019

      • alastairp
        I don't know enough about how exception handling works with multiprocessing to know if this is exactly the cause
      • 2019-08-02 21410, 2019

      • alastairp
        however it seems that a reasonable solution might be to have a try/except inside the indexer function to prevent uncaught exceptions propagating up to the main thread
      • 2019-08-02 21414, 2019

      • Lotheric has quit
      • 2019-08-02 21437, 2019

      • Lotheric joined the channel
      • 2019-08-02 21407, 2019

      • alastairp
        yeah, I just double-checked with some demo code, that's exactly what's happening. The exception is caught, but then pool.close()/pool.join() causes the process to wait until all items in the pool have finished processing
      • 2019-08-02 21447, 2019

      • yvanzo
        ok, the indexer function catches exceptions but don't handle it yet: https://github.com/metabrainz/sir/blob/272309f4e0…
      • 2019-08-02 21458, 2019

      • alastairp
        oh great, that might be a good place to do it
      • 2019-08-02 21410, 2019

      • nav2002_ has quit
      • 2019-08-02 21430, 2019

      • nav2002__ joined the channel
      • 2019-08-02 21438, 2019

      • pristine__
        #!/usr/bin/env python3
      • 2019-08-02 21438, 2019

      • pristine__
        from datetime import datetime
      • 2019-08-02 21438, 2019

      • pristine__
        from dateutil.relativedelta import relativedelta
      • 2019-08-02 21438, 2019

      • pristine__
        # in days
      • 2019-08-02 21438, 2019

      • pristine__
        RECOMMENDATION_GENERATION_WINDOW = 60
      • 2019-08-02 21438, 2019

      • pristine__ has quit
      • 2019-08-02 21449, 2019

      • pristine__ joined the channel
      • 2019-08-02 21404, 2019

      • pristine__
      • 2019-08-02 21412, 2019

      • pristine__
        ruaok: ^
      • 2019-08-02 21408, 2019

      • pristine__
        ( copy paste mishap, sorry!)
      • 2019-08-02 21423, 2019

      • kyan has quit
      • 2019-08-02 21427, 2019

      • pristine__
        is it guaranteed that we will have listens of the current month whenever we run the date related functions?
      • 2019-08-02 21442, 2019

      • pristine__
        I am not sure how incremental dumps work
      • 2019-08-02 21420, 2019

      • travis-ci joined the channel
      • 2019-08-02 21420, 2019

      • travis-ci
        Project bookbrainz-data-js build #1188: errored in 1 min 21 sec: https://travis-ci.org/bookbrainz/bookbrainz-data-…
      • 2019-08-02 21420, 2019

      • travis-ci has left the channel
      • 2019-08-02 21451, 2019

      • nav2002__ has quit
      • 2019-08-02 21407, 2019

      • kori joined the channel
      • 2019-08-02 21407, 2019

      • kori has quit
      • 2019-08-02 21407, 2019

      • kori joined the channel
      • 2019-08-02 21452, 2019

      • ephemer0l_ has quit
      • 2019-08-02 21433, 2019

      • yvanzo
        alastairp: Thanks, that helped a lot, sir just reindexed 314655 annotations among 319014, that is only 4359 are missing, for 27 with control chars.
      • 2019-08-02 21446, 2019

      • yvanzo
        The stack trace is also much better now.
      • 2019-08-02 21434, 2019

      • Lotheric
      • 2019-08-02 21434, 2019

      • BrainzBot
        SEARCH-476: Artists with hyphen in the name are not found using hyphen-minus when adding TOC to MB
      • 2019-08-02 21400, 2019

      • Lotheric
        any search experts ? :)
      • 2019-08-02 21459, 2019

      • Lotheric
        bah I already reported it earlier lol.. .created a dupe
      • 2019-08-02 21434, 2019

      • Lotheric
      • 2019-08-02 21434, 2019

      • BrainzBot
        SEARCH-472: Cannot find artist with hyphen using hyphen-minus in artist search while submitting a DiscID
      • 2019-08-02 21420, 2019

      • yvanzo
        Lotheric: Thanks for having reported it, it is because it uses direct search instead of indexed search.
      • 2019-08-02 21415, 2019

      • Lotheric
        makes sense :)
      • 2019-08-02 21448, 2019

      • Lotheric
        I often use EAC logs with CD TOC info to submit DiscID
      • 2019-08-02 21411, 2019

      • Lotheric
        with the wonderful http://eac-log-lookup.blogspot.com/ website (I'm not sure who created it but it's awesome!)
      • 2019-08-02 21447, 2019

      • yvanzo
        I just moved these tickets to MusicBrainz Server it is actually related to.
      • 2019-08-02 21406, 2019

      • Sophist-UK joined the channel
      • 2019-08-02 21402, 2019

      • Gazooo has quit
      • 2019-08-02 21450, 2019

      • Gazooo joined the channel
      • 2019-08-02 21442, 2019

      • Sophist-UK has quit
      • 2019-08-02 21438, 2019

      • Sophist-UK joined the channel
      • 2019-08-02 21455, 2019

      • SothoTalKer
        Lotheric: could we make a request to get this added to MB? :p
      • 2019-08-02 21414, 2019

      • Lotheric
        It sure is handy
      • 2019-08-02 21442, 2019

      • Lotheric
      • 2019-08-02 21425, 2019

      • SothoTalKer
        there have been a few updates already
      • 2019-08-02 21405, 2019

      • gr0uch0mars joined the channel
      • 2019-08-02 21421, 2019

      • gr0uch0mars
        hi amCap1712 I’m really sorry because I couldn’t fully review the PR (too much work preparing the last items of the trip). Please, feel free to merge. But remember these last weeks that our goal is to leave the app fully functional, rather that styling or more features/support that can come later. Once last features are ready, test in several conditions: WiFi, cell data, offline, different screen sizes… Let’s
      • 2019-08-02 21421, 2019

      • gr0uch0mars
        keep the user experience small but amazing. Keep this level, you’re working very hard and it’s a pleasure to see the progress of the project. I’ll reach to you as soon as I’m there with an internet connection. Regards!
      • 2019-08-02 21400, 2019

      • iliekcomputers
        ruaok: PST now
      • 2019-08-02 21436, 2019

      • alastairp
        yvanzo: nice! in the end what did you do? handle the exception when adding to the queue?
      • 2019-08-02 21400, 2019

      • alastairp
        what is the wscompat anyway?
      • 2019-08-02 21426, 2019

      • alastairp
      • 2019-08-02 21436, 2019

      • ephemer0l joined the channel
      • 2019-08-02 21456, 2019

      • spellew
        ferbncode: I've pushed my changes with the new brainzutils version to my pull requests
      • 2019-08-02 21422, 2019

      • ferbncode
        spellew: thanks! I'll take a look :)
      • 2019-08-02 21452, 2019

      • gr0uch0mars has quit