We enable locality aware load balancing, which means that the local endpoint is given a higher priority in the envoy endpoints list. When it needs to do a retry after an error from an endpoint envoy will by default retry higher priority endpoints again. So if we are in a situation where the local endpoint is down, envoy will use all of its retries up on it and return an error downstream to clients. This continues until eg. a circuit breaker removes the endpoint.
Envoy allows configuration of the retry_policy to change the retry behaviour, eg to make it try lower priority endpoints after a higher priority endpoint returns an error, even when the higher priority endpoint is still considered healthy. See [https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/http/http_connection_management#retry-plugin-configuration|https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/http/http_connection_management#retry-plugin-configuration]
To allow Kong Mesh to be more resilient in the face of endpoint errors, this feature request is for Kong Mesh to expose the ability to configure the following blocks of config from the link above in its retry policy;
* retry_priority
* retry_host_predicate
* host_selection_retry_max_attempts