N-Version Redundancy and Voting runs the same computation on several redundant units and compares their outputs. Where Standby/Failover tolerates a component that crashes, voting tolerates a component that keeps running but returns a wrong answer. The goal is integrity, not availability.

Two flavors share one voter. Triple Modular Redundancy (TMR) runs three identical units and masks a single hardware fault by majority. N-version programming runs independently developed implementations of one specification, so a design fault in one is outvoted by the others.

How It Works

  • N units compute a result from the same input.
  • A voter compares the outputs.
  • On agreement it forwards the result; on a minority disagreement it masks the outlier and forwards the majority (TMR); on no majority it flags an error or commands a safe state.
  • Independent design (N-version) or independent hardware (TMR) reduces correlated faults — though the Knight & Leveson experiment showed independently developed versions still fail together more often than independence predicts.

Failure Modes

  • Common-mode fault: a shared specification error, library, or input defeats the vote because every version fails the same way.
  • Voter as single point of failure: a faulty comparator corrupts every decision, so it must stay simple and be verified itself.
  • Inexact agreement: floating-point or timing differences make correct outputs disagree, so the voter needs tolerance bands.
  • Diversity erosion: budget pressure collapses N versions toward shared code, quietly removing the independence the tactic relies on.

Verification

  • Inject a divergent output in one unit; assert the voter masks it and raises an alert.
  • Review versions for shared libraries, teams, or toolchains that would correlate failures.
  • Confirm the no-majority path drives the declared safe state within its deadline.
  • Track masked-fault counts in production as an early signal of a degrading unit.
  • TMR and N-modular redundancy — identical units, hardware-fault masking by majority.
  • N-version programming — design diversity against specification and implementation faults.
  • Analytic redundancy — a simple high-assurance channel checks a complex high-performance one.
  • Standby/Failover — the complementary tactic for crash faults rather than wrong answers.

References