Does production or beta BookBrainz expose an API endpoint?
2023-02-02 03346, 2023
serialata joined the channel
2023-02-02 03311, 2023
Zhele has quit
2023-02-02 03341, 2023
Zhele joined the channel
2023-02-02 03322, 2023
zas
floyd is back, and after extensive testing, Hetzner didn't find any issue, so I guess that's a kernel bug causing incorrect report of excessive temperature, there are reports in this sense. Unclear what's causing it though, some threads indicate a firmware issue
2023-02-02 03322, 2023
zas
bitmap: I did few extra checks, including cpu stress tests, and no message in logs anymore... on reboot, new kernel + new cpu microcode so, may be, the issue doesn't exist anymore. It should be noted that temperatures reported by sensors were correct, only the repeated message in kern logs was indicating a high temp issue. A search on the web shows that some people got this message while no overheating did actually happen.
2023-02-02 03303, 2023
zas
I just did a test setting all cpus (16) on 100% for 2 minutes, no problem
2023-02-02 03330, 2023
zas
so I guess we can move services back to floyd (or make floyd secondary as you prefer)
2023-02-02 03359, 2023
zas
atj: ^^ read above about floyd / cpu temp
2023-02-02 03333, 2023
atj
zas: thanks, v weird bug
2023-02-02 03305, 2023
zas
that said, setting all cpus on 100% for 2 minutes generated an alert, but this one is based on sensors, cpu temp 100°C
2023-02-02 03338, 2023
zas
though no 'Core temperature is above threshold' in logs...
2023-02-02 03333, 2023
zas
this alert on temp when stressing cpus seem to show there's a cooling problem, because 100°C should never happen with a proper cooling...
2023-02-02 03344, 2023
zas
let me check intel recommendations on this
2023-02-02 03344, 2023
atj
i think hetzner run their DCs hot
2023-02-02 03337, 2023
zas
I run the test again, for 4 minutes
2023-02-02 03359, 2023
zas
if the cpu protection triggers we should see a message about throttling
2023-02-02 03305, 2023
zas
after 4 minutes at 100% on all cpus, temp went to 101°C, slowly increasing, but no throttling
2023-02-02 03344, 2023
zas
I'm checking actual cpu frequencies, and nope, no throttling yet, I'll do a more violent test
Anyone know if there's a BB prod/beta API endpoint?
2023-02-02 03329, 2023
trolley joined the channel
2023-02-02 03354, 2023
zas
bitmap, yvanzo: floyd is back to work
2023-02-02 03344, 2023
bitmap
zas: thanks, I'll work on restoring the standby service soon, but we can keep pink as primary for now if it won't cause any issues. (at least to avoid further downtime)