tsk, and "Blackout Thursday" had such a nice ring to it
2021-02-01 03243, 2021
ruaok
except they didn't. everything was working fine and some requests came in. we suspected the network. we suspected a DDoS attack and asked Hetzner if they could see a DDoS attack: Their word: No.
2021-02-01 03245, 2021
CatQuest
hahaha
2021-02-01 03259, 2021
ruaok
zas and I poked around and poked around.
2021-02-01 03201, 2021
CatQuest
Mr_Monkey:
2021-02-01 03216, 2021
ruaok
eventually zas started looking at IP addresses and notice that a lot were coming from AWS.
2021-02-01 03236, 2021
repo joined the channel
2021-02-01 03238, 2021
ruaok
so we block two large swaths of their IP address ranges and traffice returned to normal after 3.5 hours of downtime.
2021-02-01 03259, 2021
sumedh has quit
2021-02-01 03205, 2021
ruaok
zas then filed a report with AWS and we didn't expect anything else to happen from there.
2021-02-01 03232, 2021
ruaok
the next day one of our supporters contacted us and said "WTF, AWS says we're DDoSing you?"
2021-02-01 03247, 2021
vasharma0521 is now known as vineetsharma
2021-02-01 03216, 2021
ruaok
and we went back in forth a number of rounds to work out what it was. we first blocked all of their IPs and the problem went away. unblock on IP and the problem immediately came back for that IP.
2021-02-01 03222, 2021
ruaok
eventually they found the problem.
2021-02-01 03227, 2021
shivam-kapila
(well AWS did something atleast)
2021-02-01 03251, 2021
ruaok
turns out that a misconfiguration caused the delay in cover art archive lookups to happen A LOT faster than they should.
2021-02-01 03237, 2021
ruaok
I'm still confused if this was in end-user software or on their own servers, but they fixed it and we unblocked them
2021-02-01 03242, 2021
ruaok
anyone wanna guess who it was?
2021-02-01 03222, 2021
ruaok
there will be a blog post detailing what happened tomorrow.
2021-02-01 03246, 2021
ruaok
earlier in the week I submitted some blog post and had a conversation with a potential new unicorn.
2021-02-01 03207, 2021
ruaok
oh and I revealed that Sonos was the new unicorn we signed in December.
2021-02-01 03213, 2021
CatQuest
hmmmm
2021-02-01 03213, 2021
ruaok
shivam-kapila: no.
2021-02-01 03223, 2021
ruaok
that was it. fin.
2021-02-01 03230, 2021
ruaok
zas: anything to add?
2021-02-01 03234, 2021
zas
hey
2021-02-01 03240, 2021
shivam-kapila
new one coming. nice :)
2021-02-01 03242, 2021
ruaok
!m zas
2021-02-01 03242, 2021
BrainzBot
You're doing good work, zas!
2021-02-01 03250, 2021
zas
I investigated the issue a bit more
2021-02-01 03234, 2021
zas
so about 100 different servers were querying caa redirect service very very fast (to me, unlimited speed)
2021-02-01 03252, 2021
zas
so it caused an exhaustion of possible connections on our gateways
2021-02-01 03212, 2021
zas
we have various counter-measures but they didn't suffice
2021-02-01 03240, 2021
zas
so I added few more rate limits to caa
2021-02-01 03211, 2021
zas
during the blackout we tried to switch to herb (our second gateway)
2021-02-01 03244, 2021
zas
but I encountered various non-critical issues I worked at solving this week
also noted gateways-redis was under assault during the blackout
2021-02-01 03211, 2021
zas
I'm restarted working on something I tried a while ago (without much success) but now it works: the goal is to replace this redis instance by a fully redundant and quicker alternative
2021-02-01 03235, 2021
zas
basically keydb + keepalived + haproxy, running on gateways themselves
2021-02-01 03218, 2021
alastairp
zas: I have some keepalived configuration that works if you're interested in looking at it for comparison purposes
2021-02-01 03224, 2021
zas
I also detected a stupid issue: ufw overrides sysctl.conf on reboot, it caused gateways to not use correct systctl values
2021-02-01 03228, 2021
zas
(fixed now)
2021-02-01 03252, 2021
zas
fin. Mr_Monkey ?
2021-02-01 03254, 2021
yvanzo
!m zas
2021-02-01 03254, 2021
BrainzBot
You're doing good work, zas!
2021-02-01 03200, 2021
Mr_Monkey
Hello !
2021-02-01 03219, 2021
Mr_Monkey
Las week I worked on merging PRs and BB and LB
2021-02-01 03233, 2021
ruaok
> I also detected a stupid issue: ufw overrides sysctl.conf on reboot, it caused gateways to not use correct systctl values
2021-02-01 03248, 2021
ruaok
was this causing the dropped packets between trille and kiki?
2021-02-01 03256, 2021
zas
nope^^
2021-02-01 03205, 2021
zas
but it could have
2021-02-01 03211, 2021
Mr_Monkey
I also fixed an issue with the track search input on the LB playlist page
2021-02-01 03236, 2021
Mr_Monkey
Worked a bit more on MB icons for various devices
2021-02-01 03211, 2021
Mr_Monkey
And worked on setting up backups
2021-02-01 03229, 2021
Mr_Monkey
And finally some fiddling with Jenkins on the LB CI setup
2021-02-01 03255, 2021
Mr_Monkey
That's most of it for me !
2021-02-01 03255, 2021
Mr_Monkey
Go Freso !
2021-02-01 03207, 2021
Freso
o/
2021-02-01 03257, 2021
Freso
So I went over chat logs last week and compiled meeting notes from this year… and posted them on the forum this morning.
2021-02-01 03223, 2021
yvanzo
!m Freso
2021-02-01 03223, 2021
BrainzBot
You're doing good work, Freso!
2021-02-01 03238, 2021
Freso
Other than that, mostly lurking about, getting back into things and trying out some new/new-old processes.
2021-02-01 03241, 2021
Freso
fin.
2021-02-01 03246, 2021
Freso
alastairp: Go!
2021-02-01 03254, 2021
alastairp
last week I moved jenkins from williams to cage, to try and reduce the disk usage on williams
2021-02-01 03259, 2021
alastairp
I also upgraded jenkins and helped Mr_Monkey upgrade the use of a jenkins plugin in LB JS tests
2021-02-01 03204, 2021
alastairp
I made a start on an improvement to LB tests to make sure that we delete all unused docker images after tests finish (to prevent running out of disk space)
2021-02-01 03209, 2021
alastairp
I was around a bit during the blackout but wasn't able to help very much
2021-02-01 03215, 2021
alastairp
I did some docker/uwsgi maintenance to reduce the size of docker logs, which freed up about 300gb in total over all of our python apps
2021-02-01 03219, 2021
alastairp
I helped this morning with some improvements to caching in CB
2021-02-01 03223, 2021
alastairp
I migrated tests for BU from travis to Jenkins again
2021-02-01 03225, 2021
Freso
(Only bitmap, _lucifer, and diru1100 left on my list. Last call for anyone else wanting to give review!)
2021-02-01 03231, 2021
alastairp
bitmap: next
2021-02-01 03236, 2021
bitmap
hey
2021-02-01 03213, 2021
bitmap
(related to the supporter ddos) last week I worked on optimizing caa access in mbs to avoid routing requests through the redirect service; no real reason we need to go through the service for internal use
2021-02-01 03244, 2021
bitmap
should be submitting that today, but Zas has since added a global rate limit too
2021-02-01 03216, 2021
bitmap
I also continued working on converting the relationship editor code to React so we can convert the rest of the entity edit forms
2021-02-01 03244, 2021
bitmap
there were a bunch of conflicts to fix first, but I was updating it to make use of the DateRangeFieldset component we added for the alias edit form
2021-02-01 03207, 2021
bitmap
I'm also changing the state handling to match what we did there for consistency. I think it should be working by the end of the week
2021-02-01 03220, 2021
bitmap
otherwise mostly spent time on code review
2021-02-01 03225, 2021
bitmap
fin! _lucifer go
2021-02-01 03232, 2021
reosarevok
!m bitmap
2021-02-01 03232, 2021
BrainzBot
You're doing good work, bitmap!
2021-02-01 03238, 2021
_lucifer
hi all!
2021-02-01 03221, 2021
ruaok
> db0:keys=8660,expires=8286,avg_ttl=10039539
2021-02-01 03230, 2021
ruaok
CB still has some cache keys without expiry
2021-02-01 03241, 2021
_lucifer
I was mostly busy last week and didn't do much but I was able to help with the CB redis today. Thats it for me.
2021-02-01 03251, 2021
ruaok
thank you!
2021-02-01 03253, 2021
_lucifer
diru1100: next?
2021-02-01 03241, 2021
alastairp
ruaok: a lot less than the 10% that it was, though. (I assume you did a flushall again?) Let me look at the cached values again
2021-02-01 03253, 2021
Freso
Or maybe no diru around.
2021-02-01 03202, 2021
Freso
So I guess that’s that for reviews!
2021-02-01 03205, 2021
ruaok did flushall
2021-02-01 03219, 2021
Freso
And the GSoC topic is postponed to next week, so…
2021-02-01 03225, 2021
Freso
That rounds up this meeting!
2021-02-01 03239, 2021
Mr_Monkey
Thanks !
2021-02-01 03243, 2021
Freso
Thank you everyone who gave reviews, and thank you all for your time!