N-Version Redundancy and Voting runs the same computation on several redundant units and compares their outputs. Where Standby/Failover tolerates a component that crashes, voting tolerates a component that keeps running but returns a wrong answer. The goal is integrity, not availability.
Two flavors share one voter. Triple Modular Redundancy (TMR) runs three identical units and masks a single hardware fault by majority. N-version programming runs independently developed implementations of one specification, so a design fault in one is outvoted by the others.
How It Works
- N units compute a result from the same input.
- A voter compares the outputs.
- On agreement it forwards the result; on a minority disagreement it masks the outlier and forwards the majority (TMR); on no majority it flags an error or commands a safe state.
- Independent design (N-version) or independent hardware (TMR) reduces correlated faults — though the Knight & Leveson experiment showed independently developed versions still fail together more often than independence predicts.
Failure Modes
- Common-mode fault: a shared specification error, library, or input defeats the vote because every version fails the same way.
- Voter as single point of failure: a faulty comparator corrupts every decision, so it must stay simple and be verified itself.
- Inexact agreement: floating-point or timing differences make correct outputs disagree, so the voter needs tolerance bands.
- Diversity erosion: budget pressure collapses N versions toward shared code, quietly removing the independence the tactic relies on.
Verification
- Inject a divergent output in one unit; assert the voter masks it and raises an alert.
- Review versions for shared libraries, teams, or toolchains that would correlate failures.
- Confirm the no-majority path drives the declared safe state within its deadline.
- Track masked-fault counts in production as an early signal of a degrading unit.
Variants and Related Tactics
- TMR and N-modular redundancy — identical units, hardware-fault masking by majority.
- N-version programming — design diversity against specification and implementation faults.
- Analytic redundancy — a simple high-assurance channel checks a complex high-performance one.
- Standby/Failover — the complementary tactic for crash faults rather than wrong answers.
References
- An Experimental Evaluation of the Assumption of Independence in Multiversion Programming — Knight & Leveson, IEEE Transactions on Software Engineering, 1986