To counter the avalanche of retries on different layers, I have also seen a cust...

pyrolistical · on Oct 8, 2024

Ya. Instead of blind retries, I have server respond with “try after timestamp” header. This way it can tell everybody to back off. If no response then welp

jcgrillo · on Oct 8, 2024

For the "no response" case, e.g. clients ignoring the retry-after header (or in the particular situation where I learned this trick, retrying aggressively on 401) one can implement an "err200" response in the load balancer which makes them all go away ;).

feyman_r · on Oct 8, 2024

Sending HTTP 429 is also a good way to ask that the error returned is not because of a downstream generic failure, but excess traffic.

Having clients suspend retries altogether allows the service to come back up. Manual retries triggered from user action would be fresh requests.