You still need health checks, though? Otherwise, how do you tell the difference ...

paulddraper · on April 18, 2023

Your server scaling system will need healthchecks. But:

1. It can be more tolerant of failures, since an unresponsive server doesn't mean failed requests.

2. Hooking up your server scaling system to HAProxy, etc. isn't the easiest thing in the world anyway.

samwho · on April 18, 2023

Yeah, they’d have to exist but it would be for quite a different purpose. I wonder if that would mean we could implement them in different ways, e.g have a health check service that everything pings and if a ping isn’t received for N minutes, assume it’s dead and trigger some replacement routine or alert.

throwway120385 · on April 18, 2023

This begins to look a lot more like a software watchdog at that point, and you can even have each service provide a count of outstanding/processed requests per tick and if a server never gets outstanding or processed requests you could have the watchdog kill it off.

preseinger · on April 19, 2023

server health is necessarily a function of the actual production traffic it receives, determined at the application layer, as observed by a specific observer

it can't be known by the server itself, as (among many other reasons) the server can't know about network issues between itself and any upstream caller

it can't be determined by out-of-band health check queries, because those queries don't represent actual traffic, the simplifying assumption that they _do_ introduces many common failure modes that any seasoned engineer can speak at length about

health checks can be a nice additional signal on top of monitoring actual prod traffic, but they can't be used by themselves, they just don't capture enough relevant information