Fail-Safe Defaults

Fail-safe defaults ensure that when something unexpected happens, the system falls into a known-safe state rather than an open, permissive, or undefined one. The principle applies at every level: hardware that de-energizes to a safe position on power loss, software that denies access when an authorization check fails to complete, and configuration that disables dangerous features when a setting is missing.

The idea was formalized by Saltzer and Schroeder in 1975 as one of eight design principles for information protection, where it meant basing access decisions on permission rather than exclusion. In safety engineering, the same concept appears as the de-energize-to-safe principle: a valve closes, a brake engages, a process halts — not because an active command says so, but because the absence of a valid active signal defaults to the safe position.

How It Works

Define, for every component and operational mode, what the safe state is — the configuration that minimizes harm when the system cannot determine the correct action.
Treat all unrecognized or missing inputs as invalid by default: deny access, reject commands, disable features, halt motion.
On detecting an anomaly that exceeds defined tolerance (watchdog timeout, invariant violation, sensor out of range), transition the component to its safe state and raise an alert.
Make the safe-state transition atomic where possible — partial transitions can leave the system in a state that is neither operational nor safe.
Document each safe state explicitly so operators know what to expect after a transition and what recovery steps are required.

Failure Modes

A safe state defined too conservatively triggers on benign transients, causing unnecessary shutdowns and eroding operator trust (nuisance trips).
A safe state defined too loosely allows genuinely hazardous conditions to persist because the threshold for transition is never reached.
Partial transitions leave the system in a state that is neither safe nor operational — for example, one actuator halted while another continues.
Missing safe-state definitions for newly added components or modes: developers add a feature but forget to define what happens when it fails.
Operators override or disable fail-safe logic after repeated nuisance trips, removing the safety net entirely.

Verification

Fault injection: for each defined anomaly class (watchdog timeout, invalid input, missing config, sensor out of range), inject the condition and verify the system reaches the documented safe state within the specified time (for example < 500 ms for a safety controller).
Completeness audit: confirm that every component and operational mode has a documented safe state; flag any path that can reach an undefined state as a gap.
Atomicity check: interrupt the safe-state transition at various points (power cut, process kill) and verify the system does not settle in a half-transitioned state.
Recovery round-trip: after a safe-state transition, execute the documented recovery procedure and verify the system returns to normal operation within the agreed window.

Deny-by-default access control applies fail-safe defaults to authorization: any request not explicitly permitted is denied.
Watchdog supervision uses a periodic heartbeat; absence of the heartbeat triggers the safe-state transition.
Dead man’s switch requires continuous active confirmation to remain in an operational mode — releasing the switch defaults to safe.
Circuit breaker implements a fail-safe default for remote dependencies: when failures exceed a threshold, the breaker opens and returns a safe fallback rather than forwarding requests.

References

Saltzer & Schroeder: The Protection of Information in Computer Systems (1975) — the original formulation of fail-safe defaults as a design principle
IEC 61508: Functional Safety of E/E/PE Systems — the foundational standard for safety integrity levels and safe-state design
Software Architecture in Practice — Bass, Clements & Kazman (full citation)

Fail-Safe Defaults

Intent

Mechanism

Applicability

How It Works

Failure Modes

Verification

References

Supported Qualities

Trade-offs

Related Requirements

Intent

Mechanism

Applicability

How It Works

Failure Modes

Verification

Variants and Related Tactics

References

Supported Qualities

Trade-offs

Related Requirements