To counter the avalanche of retries on different layers, I have also seen a custom header being added to all requests that are retries. Upon receiving a request with this header, the microservice would turn off its own retry logic for this request.
Ya. Instead of blind retries, I have server respond with “try after timestamp” header. This way it can tell everybody to back off. If no response then welp
For the "no response" case, e.g. clients ignoring the retry-after header (or in the particular situation where I learned this trick, retrying aggressively on 401) one can implement an "err200" response in the load balancer which makes them all go away ;).