#metabrainz

/

      • d4rkie has quit
      • 2021-02-08 03944, 2021

      • Nyanko-sensei joined the channel
      • 2021-02-08 03937, 2021

      • flamingspinach has quit
      • 2021-02-08 03907, 2021

      • sumedh joined the channel
      • 2021-02-08 03923, 2021

      • sumedh has quit
      • 2021-02-08 03928, 2021

      • flamingspinach joined the channel
      • 2021-02-08 03935, 2021

      • Darkloke joined the channel
      • 2021-02-08 03907, 2021

      • sumedh joined the channel
      • 2021-02-08 03958, 2021

      • flamingspinach has quit
      • 2021-02-08 03913, 2021

      • flamingspinach joined the channel
      • 2021-02-08 03922, 2021

      • BrainzGit
        [musicbrainz-server] reosarevok opened pull request #1899 (master…MBS-11370): MBS-11370: display track lengths of 0 ms or -1 ms as unknown https://github.com/metabrainz/musicbrainz-server/…
      • 2021-02-08 03933, 2021

      • Mr_Monkey
        Moin !
      • 2021-02-08 03903, 2021

      • nobodyrocks[m] has quit
      • 2021-02-08 03958, 2021

      • Etua joined the channel
      • 2021-02-08 03918, 2021

      • Etua has quit
      • 2021-02-08 03921, 2021

      • sumedh has quit
      • 2021-02-08 03934, 2021

      • alastairp
      • 2021-02-08 03937, 2021

      • alastairp
        morning
      • 2021-02-08 03921, 2021

      • _lucifer
        hi alastairp!
      • 2021-02-08 03941, 2021

      • _lucifer
        I was thinking to work on BU-6. Do you any suggestions/pointers for it? For now, I am extracting the common parts of the code to see how much overlap is there.
      • 2021-02-08 03942, 2021

      • BrainzBot
        BU-6: Add data dump functionality used by all Python Brainz https://tickets.metabrainz.org/browse/BU-6
      • 2021-02-08 03952, 2021

      • ruaok
        moooin!
      • 2021-02-08 03926, 2021

      • iliekcomputers
        morning!
      • 2021-02-08 03945, 2021

      • reosarevok
        Heh, MBS-11349
      • 2021-02-08 03945, 2021

      • BrainzBot
        MBS-11349: Inconsistent album length https://tickets.metabrainz.org/browse/MBS-11349
      • 2021-02-08 03950, 2021

      • iliekcomputers
        _lucifer: I'm not sure that ticket is really high priority enough for the effort it'll need.
      • 2021-02-08 03928, 2021

      • iliekcomputers
        all three of our projects have data dumps already, not sure refactoring that code into BU is the best use of your time for now.
      • 2021-02-08 03953, 2021

      • reosarevok
        yvanzo, bitmap: not sure what to do with that one. We *do* calculate different durations based on whether tracks or discID is loaded, for the same release
      • 2021-02-08 03906, 2021

      • reosarevok
        But which one is "right"? :D
      • 2021-02-08 03916, 2021

      • reosarevok
        At first I thought dropping the pregap would be the same
      • 2021-02-08 03926, 2021

      • reosarevok
        eh. s/same/best/
      • 2021-02-08 03917, 2021

      • reosarevok
        But then, what to do when a disc has an actual pregap track? We don't know that, because we don't have the tracks loaded
      • 2021-02-08 03943, 2021

      • reosarevok
        Should we just "remove two seconds per disc if the pregap is longer than two seconds"?
      • 2021-02-08 03946, 2021

      • _lucifer
        iliekcomputers: yeah, makes sense
      • 2021-02-08 03900, 2021

      • reosarevok
        (alternatively, we could have a length view like we have for first_recording_date, that'd probably also help with MBS-11268)
      • 2021-02-08 03900, 2021

      • BrainzBot
        MBS-11268: Show “Set track durations” on release/discids page https://tickets.metabrainz.org/browse/MBS-11268
      • 2021-02-08 03902, 2021

      • Gazooo794944007 has quit
      • 2021-02-08 03943, 2021

      • Gazooo794944007 joined the channel
      • 2021-02-08 03908, 2021

      • Quoth joined the channel
      • 2021-02-08 03934, 2021

      • alastairp
        _lucifer: hmm, for me the key functionality would be 1) be able to easily define the dump structure for "normal" tables (e.g. user lists), 2) be able to dump and restore them, 3) and have tests for the dump code
      • 2021-02-08 03926, 2021

      • alastairp
        I think that things like lb listens or ab data is out of scope for this, but everything else probably fits within the scope of it
      • 2021-02-08 03910, 2021

      • Nyanko-sensei has quit
      • 2021-02-08 03944, 2021

      • Nyanko-sensei joined the channel
      • 2021-02-08 03938, 2021

      • BrainzGit
        [bookbrainz-site] MonkeyDo closed pull request #552 (master…snyk-upgrade-f744969d2d654e17931b5b4d9772de43): [Snyk] Upgrade swagger-jsdoc from 4.0.0 to 4.3.2 https://github.com/bookbrainz/bookbrainz-site/pul…
      • 2021-02-08 03940, 2021

      • _lucifer
        alastairp: agreed. also, i tested the BU PR for version change and its works fine for AB, CB, LB (one the dataset hoster PR is merged and released)
      • 2021-02-08 03951, 2021

      • alastairp
        _lucifer: yeah, I saw that thanks
      • 2021-02-08 03911, 2021

      • alastairp
        did you see the comment that I left in that PR, after I discussed with iliekcomputers
      • 2021-02-08 03932, 2021

      • alastairp
        we decided to pin specific versions of all of the dependencies in the downstream project
      • 2021-02-08 03941, 2021

      • _lucifer
        about pinning versions downstream?
      • 2021-02-08 03942, 2021

      • _lucifer
        yes
      • 2021-02-08 03934, 2021

      • alastairp
        see for example that flask isn't here: https://github.com/metabrainz/listenbrainz-server…
      • 2021-02-08 03940, 2021

      • _lucifer
        I'll check and add the versions if thery are missing
      • 2021-02-08 03957, 2021

      • alastairp
        thank you!
      • 2021-02-08 03911, 2021

      • travis-ci joined the channel
      • 2021-02-08 03911, 2021

      • travis-ci
        Project bookbrainz-site build #3633: passed in 12 min 48 sec: https://travis-ci.org/bookbrainz/bookbrainz-site/…
      • 2021-02-08 03911, 2021

      • travis-ci has left the channel
      • 2021-02-08 03922, 2021

      • Quoth has quit
      • 2021-02-08 03958, 2021

      • Etua joined the channel
      • 2021-02-08 03931, 2021

      • Etua has quit
      • 2021-02-08 03912, 2021

      • ruaok
        alastairp: got a sec to discuss some ideas about LB data dump creation visibility?
      • 2021-02-08 03900, 2021

      • ruaok ambles to the office for now
      • 2021-02-08 03946, 2021

      • BrainzGit
        [musicbrainz-server] jesus2099 closed pull request #1897 (master…fix-profile-add-release-link-filter): MBS-11174: Swap Add release edit types (216,31 instead of 31,216) https://github.com/metabrainz/musicbrainz-server/…
      • 2021-02-08 03950, 2021

      • BrainzGit
        [bookbrainz-site] MonkeyDo merged pull request #503 (master…snyk-upgrade-ddfadf045c35fe0947282d8dccfdd5c4): [Snyk] Upgrade morgan from 1.9.1 to 1.10.0 https://github.com/bookbrainz/bookbrainz-site/pul…
      • 2021-02-08 03939, 2021

      • flamingspinach has quit
      • 2021-02-08 03956, 2021

      • flamingspinach joined the channel
      • 2021-02-08 03926, 2021

      • BrainzGit
        [listenbrainz-server] mayhem merged pull request #1252 (master…typesense-index-deploy): Typesense index deploy https://github.com/metabrainz/listenbrainz-server…
      • 2021-02-08 03950, 2021

      • travis-ci joined the channel
      • 2021-02-08 03950, 2021

      • travis-ci
        Project bookbrainz-site build #3634: failed in 3 min 45 sec: https://travis-ci.org/bookbrainz/bookbrainz-site/…
      • 2021-02-08 03950, 2021

      • travis-ci has left the channel
      • 2021-02-08 03905, 2021

      • travis-ci joined the channel
      • 2021-02-08 03905, 2021

      • travis-ci
        Project bookbrainz-site build #3634: passed in 4 min 16 sec: https://travis-ci.org/bookbrainz/bookbrainz-site/…
      • 2021-02-08 03905, 2021

      • travis-ci has left the channel
      • 2021-02-08 03946, 2021

      • Darkloke has quit
      • 2021-02-08 03936, 2021

      • sumedh joined the channel
      • 2021-02-08 03928, 2021

      • alastairp
        ruaok: hi, just saw your message
      • 2021-02-08 03930, 2021

      • alastairp
        what's up
      • 2021-02-08 03936, 2021

      • ruaok
        hey.
      • 2021-02-08 03941, 2021

      • ruaok
        the LB dumps, stats generation and rec recommendation stuff has some bugs, but the current observability emails provide very little insight beyond "cron did something, you should go check" and that gets filtered and ignore in my email.
      • 2021-02-08 03927, 2021

      • ruaok
        I want to have a place where I can see how things ran. did a job complete or was it interrupted. which users had data generated, which ones not.
      • 2021-02-08 03939, 2021

      • ruaok
        email is certainly not the way to do this. telegraf isn't either.
      • 2021-02-08 03949, 2021

      • ruaok
        I am wonder if using sentry would be an abuse of this.
      • 2021-02-08 03955, 2021

      • alastairp
        currently they run through cron? and the only report is the email sent by printing to stdout in cron?
      • 2021-02-08 03906, 2021

      • ruaok
        perhaps a different sentry project? tagged entries? not sure.
      • 2021-02-08 03925, 2021

      • ruaok
        not even stdout...
      • 2021-02-08 03929, 2021

      • alastairp
        kind of sounds like you're asking for unified logging
      • 2021-02-08 03953, 2021

      • ruaok
      • 2021-02-08 03900, 2021

      • ruaok
        not really.
      • 2021-02-08 03912, 2021

      • ruaok
        I want a console for our periodic jobs.
      • 2021-02-08 03927, 2021

      • alastairp
        ah, I see. that's an explicit email. I didn't know that existed
      • 2021-02-08 03930, 2021

      • alastairp
        yes, right
      • 2021-02-08 03945, 2021

      • ruaok
        its not well advertised since its... not very useful.
      • 2021-02-08 03945, 2021

      • alastairp
        I'm trying to recall if I know of something similar that doesn't require us to reinvent the wheel
      • 2021-02-08 03909, 2021

      • ruaok
        one thought was to create status tables somewhere and then slap a dataset hoster in front.
      • 2021-02-08 03918, 2021

      • ruaok
        also very odd.
      • 2021-02-08 03937, 2021

      • alastairp
        so we want to know at least: when something last ran, if it ran successfully, any errors or log messages made during the time
      • 2021-02-08 03903, 2021

      • ruaok
        minimally yes.
      • 2021-02-08 03905, 2021

      • alastairp
        I think we've kind of shown that sentry isn't a great fit for logging here
      • 2021-02-08 03913, 2021

      • alastairp
        given what we started to remove
      • 2021-02-08 03939, 2021

      • alastairp
        when you say that these services have some bugs, what do you mean? Bugs that cause the process to crash? or more subtle ones?
      • 2021-02-08 03945, 2021

      • ruaok
        the checking how data was generated per user might be done elsewhere. in particular we need to show "this data was last updated on X" timestamps near where the data is displayed. right now there is little visiblity around this.
      • 2021-02-08 03927, 2021

      • ruaok
        sometimes dumps get interrupted and I dont know why. consul updates, perhaps. that causes dumps to fail and empty files to be dumped to the FTP site.
      • 2021-02-08 03947, 2021

      • ruaok
        those are the most crtiical bugs to fix, but thats hard to do without decent logging to see what is going on.
      • 2021-02-08 03925, 2021

      • alastairp
        yes, right
      • 2021-02-08 03931, 2021

      • ruaok
        then users get deleted in PG, but not in spark so the calculated data flow borks since the dest is gone.
      • 2021-02-08 03950, 2021

      • alastairp
        this ties in a bit with what we discussed last time in person - logging for when services are restarted. but this isn't the whole picture
      • 2021-02-08 03955, 2021

      • Etua joined the channel
      • 2021-02-08 03902, 2021

      • ruaok
        this is fixed with the next full data dump. this is not a real problem, but just one of the many things odd about this.
      • 2021-02-08 03916, 2021

      • ruaok
        yes.
      • 2021-02-08 03929, 2021

      • ruaok
        what I want doesn't fit well into any of our existing monitoring systems.
      • 2021-02-08 03954, 2021

      • alastairp
        I like the idea of a dashboard for this, but I'm unsure where the limits of that should be
      • 2021-02-08 03932, 2021

      • ruaok
        one possible scenario is really clean up what gets generated from teh containers and have it to go a log file that can acutually be served by lb-web so that anyone can follow along.
      • 2021-02-08 03939, 2021

      • alastairp
        an existing python package that replaces cron? do we do it all ourselves? (more maintenance, ugh), do we make it generic for use in other projects?
      • 2021-02-08 03904, 2021

      • ruaok
        all good questions. I don't want to invent more stuff.
      • 2021-02-08 03915, 2021

      • alastairp
        yes, that's why I asked about cron emails - cron will email stdout to MAIL
      • 2021-02-08 03940, 2021

      • alastairp
        but we could just as easily configure logging to go to stdout + a file, and have a basic interface over a set of directories
      • 2021-02-08 03949, 2021

      • ruaok
        some parts of what I want a temporary -- debugging stuff. that will go away. and hopefully all this will go away and become more transparent for the user over time. so I am hesitant to spend a lot of time on this.
      • 2021-02-08 03906, 2021

      • ruaok
        alastairp: agreed, I think that might be a good short term solution.
      • 2021-02-08 03946, 2021

      • alastairp
        the specific case of users getting deleted in pg but not in spark sounds like a tangential bug. That is, there's a problem here, but it's not core to our question of how to log stuff
      • 2021-02-08 03929, 2021

      • alastairp
        I made something similar for an MTG project many years ago, which captured logging messages and put them in redis, then had a dashboard to show logs for each job
      • 2021-02-08 03935, 2021

      • alastairp
        I don't think it even works any more
      • 2021-02-08 03935, 2021

      • ruaok
        its not even a problem per se. a temporal blip -- later we may want to check to ensure that people exist in PG before generating.
      • 2021-02-08 03947, 2021

      • iliekcomputers
        i'm interested in this conversation as well, but in a meeting, i'll read the backlog.
      • 2021-02-08 03952, 2021

      • ruaok
        iliekcomputers: +1
      • 2021-02-08 03908, 2021

      • ruaok
        alastairp: this convo is helping me sort out my thoughts.
      • 2021-02-08 03933, 2021

      • ruaok
        first steps: have cron log to docker, improve log messages and stop emails.
      • 2021-02-08 03905, 2021

      • ruaok
        with emphasis on cron jobs starting/stopping and cleaning up correctly if things go badly.
      • 2021-02-08 03918, 2021

      • ruaok
        I think that will shape future thinking on how to do this better.
      • 2021-02-08 03932, 2021

      • alastairp
        that sounds like a good first step
      • 2021-02-08 03958, 2021

      • ruaok
        lets keep thinking about this issue and see what we can come up with.
      • 2021-02-08 03944, 2021

      • alastairp
        cron jobs and web both run on lemmy. this means we can share a volume between them and show logs on an admin
      • 2021-02-08 03908, 2021

      • ruaok
        that would be a pretty decent start, yes.
      • 2021-02-08 03958, 2021

      • alastairp
        ruaok: we said that we'd talk about caching today too
      • 2021-02-08 03941, 2021

      • ruaok
        that also means we'd need to talk about cache invalidation too. 🤬
      • 2021-02-08 03924, 2021

      • ruaok
        but, I can do that now...
      • 2021-02-08 03924, 2021

      • alastairp
        I had the following topics on my list: 1) remove cache key hashing (_lucifer has done this), 2) look in to seeing if we need to improve redis configs to make writing to disk more efficient, 3) enforce expiry times in BU? and update all downstream apps, 4) make a decision on BU-25
      • 2021-02-08 03924, 2021

      • BrainzBot
        BU-25: Cache namespace versions don't work in docker or with distributed hosts https://tickets.metabrainz.org/browse/BU-25
      • 2021-02-08 03918, 2021

      • ruaok
        key hashing means that we have better keyspace visibility, yes?
      • 2021-02-08 03912, 2021

      • ruaok
        I think enforcing expiry times -- how about setting a faily default expiry time? 10mins or so?
      • 2021-02-08 03912, 2021

      • alastairp
        exactly. so we can just look at the keys and work out what is in there
      • 2021-02-08 03922, 2021

      • ruaok
        +100 to that.
      • 2021-02-08 03934, 2021

      • alastairp
        the only reason it existed was that memcache had a max key name limit. we're no way near this limit in redis
      • 2021-02-08 03910, 2021

      • alastairp
        honestly, I don't think it's a problem to make the expiry time required
      • 2021-02-08 03915, 2021

      • ruaok
        ok, #2 is what exactly?
      • 2021-02-08 03929, 2021

      • alastairp
        it makes people think in more detail about what they're actually caching
      • 2021-02-08 03935, 2021

      • ruaok
        > honestly, I don't think it's a problem to make the expiry time required
      • 2021-02-08 03949, 2021

      • ruaok
        I wonder how much work that will be in the short term. but yes, making people think about it is good.
      • 2021-02-08 03908, 2021

      • alastairp
        for #2, I was reading through redis configuration docs a few weeks ago, and there are a few different ways of telling it to write to disk
      • 2021-02-08 03953, 2021

      • alastairp
        it seems that there's a "replay" version, where it just writes all commands sequentially, but there's also a version that looks at all of the commands and simplifies it (e.g. if you add a key then delete it, it no longer needs to be in the replay log)
      • 2021-02-08 03909, 2021

      • ruaok
        that actually begs the question, should redis be writing to disk in normal operations?
      • 2021-02-08 03914, 2021

      • alastairp
        it came to mind after we were looking at the amount of disk IO that CB redis was doing