#metabrainz

/

      • bitmap
        no replication lag on pink though
      • 2023-06-02 15348, 2023

      • bitmap
        I'm going to try restarting the standby server
      • 2023-06-02 15325, 2023

      • bitmap
        there's nothing in the PG logs
      • 2023-06-02 15347, 2023

      • bitmap
        I think I found the issue
      • 2023-06-02 15318, 2023

      • zas
        what is it?
      • 2023-06-02 15311, 2023

      • bitmap
        sorry maybe not. the restore_command was incorrect in pink's configuration, but that shouldn't cause WAL files to accumulate...
      • 2023-06-02 15307, 2023

      • zas
        there were a lot of inserts on floyd
      • 2023-06-02 15332, 2023

      • zas
      • 2023-06-02 15316, 2023

      • zas
        during one hours at 15k ops
      • 2023-06-02 15349, 2023

      • bitmap
        found an inactive replication slot on pink, and dropped that, but didn't help either
      • 2023-06-02 15312, 2023

      • zas
        so many inserts are very likely to cause pink to fall behind floyd
      • 2023-06-02 15344, 2023

      • zas
      • 2023-06-02 15344, 2023

      • zas
        wal increase started right after the end of this event
      • 2023-06-02 15358, 2023

      • bitmap
        it makes sense but I'm not seeing any actual replication lag on pink...
      • 2023-06-02 15313, 2023

      • bitmap
        it's fully up to date so shouldn't have any need to hold on to these
      • 2023-06-02 15338, 2023

      • zas
        is it possible it failed to delete those WALs?
      • 2023-06-02 15325, 2023

      • bitmap
        it's possible but I didn't find any errors in pink logs
      • 2023-06-02 15319, 2023

      • bitmap
        zas: did you clear them just now?
      • 2023-06-02 15336, 2023

      • zas
        nope; not even logged in
      • 2023-06-02 15354, 2023

      • bitmap
        huh. they all just disappeared lol
      • 2023-06-02 15330, 2023

      • zas
        and pink is in sync with floyd?
      • 2023-06-02 15338, 2023

      • bitmap
        maybe it was that inactive replication slot that I dropped and there was a delay in cleaning them up
      • 2023-06-02 15302, 2023

      • bitmap
        yeah, same as it was
      • 2023-06-02 15322, 2023

      • bitmap
        still nothing in pink logs to indicate wtf it was doing
      • 2023-06-02 15346, 2023

      • bitmap
        I'm thinking it was the replication slot I dropped, but I expected it to clean up the WAL files immediately
      • 2023-06-02 15308, 2023

      • zas
        it's late here, so I'm off to bed, but we need to understand what happened exactly starting with the huge number of inserts that triggered the issue
      • 2023-06-02 15310, 2023

      • bitmap
        agreed
      • 2023-06-02 15320, 2023

      • zas
        if values are correct we got 15k ops for one hour, that's 54M inserts
      • 2023-06-02 15313, 2023

      • zas
        on musicbrainz_db
      • 2023-06-02 15323, 2023

      • zas
        the accumulation of wals can be a serious problem if they fill disks
      • 2023-06-02 15308, 2023

      • bitmap
        can you link where you see 54M inserts for musicbrainz_db?
      • 2023-06-02 15320, 2023

      • zas
        from the graph above
      • 2023-06-02 15351, 2023

      • zas
      • 2023-06-02 15312, 2023

      • zas
        check floyd musicbrainzdb inserted
      • 2023-06-02 15336, 2023

      • bitmap
        found it, I had the wrong time range
      • 2023-06-02 15324, 2023

      • zas
        well, that's more 11-12k/s
      • 2023-06-02 15349, 2023

      • zas
        but still, during one hour that's a lot
      • 2023-06-02 15345, 2023

      • zas
        so I'm curious about what was inserted, did the database size actually increased?
      • 2023-06-02 15335, 2023

      • zas
        there was a huge peak in deleted a bit after
      • 2023-06-02 15340, 2023

      • bitmap
        it doesn't seem like it
      • 2023-06-02 15353, 2023

      • bitmap
        I'm trying to figure out where those operations came from
      • 2023-06-02 15338, 2023

      • zas
        well, I cannot think anymore, I'm off :) thanks for your help on this, we still have a lot of questions, but the fact wal count stopped increasing is reassuring
      • 2023-06-02 15344, 2023

      • zas
        gn
      • 2023-06-02 15341, 2023

      • bitmap
        good night :)
      • 2023-06-02 15346, 2023

      • bitmap
        zas: I think it's traffic from the mbid_mapper (under mapping schema in musicbrainz_db) judging by logs, but not certain
      • 2023-06-02 15356, 2023

      • Lotheric has quit
      • 2023-06-02 15318, 2023

      • Lotheric joined the channel
      • 2023-06-02 15320, 2023

      • aerozol
        yvanzo: How about something like this for the weblate icon https://usercontent.irccloud-cdn.com/file/QkvYSbw…
      • 2023-06-02 15350, 2023

      • aerozol
      • 2023-06-02 15322, 2023

      • aerozol
        I tried using parts of the universal language icon but it’s so complex - because we’re mainly trying to change the favicon I couldn’t get it work. But I used the same hangul letter so there’s some connection
      • 2023-06-02 15339, 2023

      • aerozol
        p.s. if we’re keen on that one I’ll double check if it’s okay to turn the character like that
      • 2023-06-02 15337, 2023

      • Zhele has quit
      • 2023-06-02 15318, 2023

      • CatQuest
        isn't that japanese hiragana?
      • 2023-06-02 15329, 2023

      • CatQuest
        well I liked it
      • 2023-06-02 15344, 2023

      • CatQuest
        it seemed right
      • 2023-06-02 15354, 2023

      • CatQuest
        idk about th colour thoug. is green for translationbrainz? :D
      • 2023-06-02 15323, 2023

      • CatQuest
        oh god can we name our weblate instance TransBrainz /jk
      • 2023-06-02 15320, 2023

      • zas
        bitmap: according to time of the load on gaga, it can be https://github.com/metabrainz/listenbrainz-server…
      • 2023-06-02 15347, 2023

      • Zhele joined the channel
      • 2023-06-02 15326, 2023

      • zas
        https://github.com/metabrainz/listenbrainz-server… ends with a delete operation which could match what we observed
      • 2023-06-02 15356, 2023

      • zas
        lucifer, mayhem : ^^ can this stuff do ~12k op/s for one hour on musicbrainz_db ?
      • 2023-06-02 15323, 2023

      • lucifer
        zas: yes we have a cronjob that runs on thursday and is write intensive.
      • 2023-06-02 15342, 2023

      • lucifer
        i am not sure how much ops it generates though
      • 2023-06-02 15355, 2023

      • zas
        lucifer: too much ;)
      • 2023-06-02 15323, 2023

      • lucifer
        zas: i think we can make that particular table unlogged so that its contents aren't written to WAL.
      • 2023-06-02 15307, 2023

      • lucifer
        i'll look into its implications and discuss it with mayhem later today.
      • 2023-06-02 15341, 2023

      • BrainzGit
        [listenbrainz-android] 14akshaaatt merged pull request #133 (03main…trackbar): Feat: SeekBar in listenPlayer https://github.com/metabrainz/listenbrainz-androi…
      • 2023-06-02 15355, 2023

      • zas
        ok, in short, for us, high number of ops on pg floyd, caused creation of 120gb of WAL files on pink (because it couldn't cope with the rate apparently), then new WAL files kept accumulating
      • 2023-06-02 15329, 2023

      • zas
        we were far from disk size limits, and we're still unsure whether WALs got cleaned after bitmap's action or not
      • 2023-06-02 15359, 2023

      • zas
        also it got almost unnoticed because of a buggy alert (fixed now, the good status of floyd was hiding the bad status of pink)
      • 2023-06-02 15359, 2023

      • lucifer
        i see makes sense. i think we should be able to avoid the WAL creation atleast with the UNLOGGED quickfix.
      • 2023-06-02 15320, 2023

      • zas
        lucifer: it's possible to run this task at anytime right? I mean can we trigger it for real-life testing (in order to verify it has expected results, but under control this time)?
      • 2023-06-02 15343, 2023

      • lucifer
        zas: yes, we can.
      • 2023-06-02 15340, 2023

      • zas
        we measured a lot of db inserts (for more than one hour), followed by a lot of delete operations under a short time. Can you link those to code?
      • 2023-06-02 15344, 2023

      • zas
      • 2023-06-02 15315, 2023

      • zas
      • 2023-06-02 15342, 2023

      • zas
      • 2023-06-02 15317, 2023

      • zas
      • 2023-06-02 15332, 2023

      • aerozol
        CatQuest: looks Japanese for sure! But pretty sure it's Korean script.. they are all so similar!
      • 2023-06-02 15310, 2023

      • aerozol
        The green is just the MetaBrainz color, I've been using it for stuff that covers all projects (tickets, forums, wiki)
      • 2023-06-02 15357, 2023

      • reosarevok
        I like the icon :)
      • 2023-06-02 15302, 2023

      • mayhem
        moooin
      • 2023-06-02 15322, 2023

      • mayhem
        lucifer: zas : not logging the metadata cache makes sense to me. its all derived data.
      • 2023-06-02 15302, 2023

      • zas
        mayhem: morning
      • 2023-06-02 15302, 2023

      • zas
        do you have any idea why the problem only appears now? I wonder if that's a change on lb side or due to the switch floyd<->pink we did earlier this week
      • 2023-06-02 15316, 2023

      • zas
        I mean this cron job runs since a long time, right?
      • 2023-06-02 15349, 2023

      • mayhem
        yes, no change on our side.
      • 2023-06-02 15320, 2023

      • mayhem
        awww man, 0 listens recorded for all day yesterday. I'm guessing the spotify app dropped off at some point and then panoscrobbler couldn't do its job.
      • 2023-06-02 15334, 2023

      • mayhem
        so panoscrobbler isn't up to the task either.
      • 2023-06-02 15322, 2023

      • zas
        hmmm, it happened in the past, last month we had a similar peak on pink, though it didn't had the same results
      • 2023-06-02 15329, 2023

      • zas
      • 2023-06-02 15307, 2023

      • zas
        both peaks are very similar, (actually 2, inserted then deleted)
      • 2023-06-02 15334, 2023

      • zas
        but the issue appeared after the switch to floyd
      • 2023-06-02 15337, 2023

      • zas
        bitmap: ^^
      • 2023-06-02 15346, 2023

      • zas
        we had an alert on WAL count last month, just after the peak, but it recovered
      • 2023-06-02 15349, 2023

      • zas
      • 2023-06-02 15303, 2023

      • zas
      • 2023-06-02 15315, 2023

      • texke joined the channel
      • 2023-06-02 15311, 2023

      • zas
        atj: around?
      • 2023-06-02 15323, 2023

      • phw has quit
      • 2023-06-02 15336, 2023

      • atj
        zas: yes for a short while
      • 2023-06-02 15355, 2023

      • lucifer
        mayhem: to confirm, unlogging the table implies that it won't be available on replicas and will not survive crash or unclean shutdowns.
      • 2023-06-02 15323, 2023

      • mayhem
        its 100% derived data, yes? then I see no problem with that.
      • 2023-06-02 15342, 2023

      • lucifer
        yup completely derived data. cool, sounds good.
      • 2023-06-02 15354, 2023

      • zas
        atj: will you be available for the move to new consul today?
      • 2023-06-02 15326, 2023

      • phw joined the channel
      • 2023-06-02 15335, 2023

      • atj
        i have work commitments until 13:30, then lunch, so will be around after 14:00
      • 2023-06-02 15316, 2023

      • atj
        can you list the steps that are required for the migration?
      • 2023-06-02 15351, 2023

      • zas
        lucifer, mayhem: even though it triggered the pink WALs issue, I don't think that's an actual issue on LB side. Of course, reducing the use of main db is always better, but this process didn't trigger the issue on last month run.
      • 2023-06-02 15307, 2023

      • zas
        atj: yes, that's quite short: deploy consul/unbound changes, stop dnsmasq & consul-agent containers, restart consul/unbound with new configs
      • 2023-06-02 15331, 2023

      • zas
        well, we need to test the process manually on few nodes first, but I think it can be automated
      • 2023-06-02 15334, 2023

      • zas
        once done, we can totally remove few containers from each node (serviceregistrator-10.2.2.*, consul-agent, dnsmasq) and remove them from startup scripts
      • 2023-06-02 15315, 2023

      • zas
        now, we hope the deployment doesn't break anything in running containers, I expect surprises as usual
      • 2023-06-02 15314, 2023

      • atj
        do we want to do this on a Friday afternoon?
      • 2023-06-02 15357, 2023

      • zas
        well, we are in the middle of the river right now, and I prefer we reach the shore before the flood...
      • 2023-06-02 15336, 2023

      • atj
        bit of a tortured metaphor there ;) are we expecting a flood?
      • 2023-06-02 15351, 2023

      • zas
        well, yes, always ;)
      • 2023-06-02 15303, 2023

      • zas
        more seriously we can delay that to Monday morning
      • 2023-06-02 15330, 2023

      • atj
        i'd like to do the testing and write the playbook this afternoon
      • 2023-06-02 15344, 2023

      • zas
        ok, and we can refine steps
      • 2023-06-02 15345, 2023

      • atj
        then we can roll it out on Monday
      • 2023-06-02 15306, 2023

      • zas
        ok for me
      • 2023-06-02 15353, 2023

      • mayhem
        zas: have you started blocking lidarr?
      • 2023-06-02 15349, 2023

      • atj
        rudi/rex are blocking over 500 HTTPS req/s
      • 2023-06-02 15304, 2023

      • mayhem
        damn.
      • 2023-06-02 15322, 2023

      • mayhem
        lidarr specific or total?
      • 2023-06-02 15339, 2023

      • atj
        total
      • 2023-06-02 15345, 2023

      • atj
        i was just surprised by the numbers
      • 2023-06-02 15302, 2023

      • mayhem
        yeah, QNAP et al.
      • 2023-06-02 15310, 2023

      • zas
        atj: dreq is only what we block based on IPs, blocking based on UAs is done at openresty level (but I parse logs from time to time to block some at IP level)
      • 2023-06-02 15335, 2023

      • atj
        right, you can't block UAs at TCP level
      • 2023-06-02 15355, 2023

      • zas
        mayhem: I blocked lidarr-extended for now
      • 2023-06-02 15332, 2023

      • mayhem
        I think we should block anything with lidarr in its UA. they have their own hosting -- there is no need to serve any of their requests.
      • 2023-06-02 15308, 2023

      • zas
        ok, I'll check logs
      • 2023-06-02 15339, 2023

      • zas
        I see no request from lidarr (only from lidarr-extended)
      • 2023-06-02 15303, 2023

      • mayhem
        ok. lets keep monitoring.
      • 2023-06-02 15307, 2023

      • zas
        oh wait, there are few, very few
      • 2023-06-02 15339, 2023

      • zas
        I'll block those too
      • 2023-06-02 15353, 2023

      • mayhem
        great
      • 2023-06-02 15328, 2023

      • zas
        done
      • 2023-06-02 15332, 2023

      • mayhem
        thx
      • 2023-06-02 15327, 2023

      • reosarevok
        I did not know zas is the new Cantona, but maybe the weirdly poetic metaphors are a French people thing :D
      • 2023-06-02 15333, 2023

      • zas
        :D
      • 2023-06-02 15347, 2023

      • monkey
        Yes they are
      • 2023-06-02 15301, 2023

      • monkey
        mayhem, aerozol: Recs page has been nicely polished up, and is ready for inspection!
      • 2023-06-02 15301, 2023

      • monkey
        One question I wanna ask you is whether you should be able to save my playlists, or if you should see those save buttons only on your own profile: https://test.listenbrainz.org/user/mr_monnkey/rec…
      • 2023-06-02 15328, 2023

      • mayhem
        monnkey!
      • 2023-06-02 15339, 2023

      • monkey
        herr
      • 2023-06-02 15355, 2023

      • monkey
        moonkey 🌙🐒
      • 2023-06-02 15326, 2023

      • mayhem
        I think a user should be able to save someone else's playlist, yes.