Data Replication keeps copies of the same data on independent nodes. Copies let the system survive node loss, serve reads from the nearest or least-loaded replica, and scale read throughput. The hard part is not making copies — it is deciding what a reader sees while copies are briefly out of step.

Strategies sit on two axes: where writes are accepted (single-leader, multi-leader, leaderless) and when copies converge (synchronous or asynchronous). Those choices set the consistency–availability–latency tradeoff.

How It Works

  • A write lands on a replica and propagates to the others.
  • Synchronous replication holds the write until a quorum acknowledges; asynchronous returns immediately and propagates in the background.
  • A consistency model — strong, read-your-writes, or eventual — defines which writes a later read must reflect.
  • A quorum or consensus protocol (e.g. Raft, Paxos) coordinates write ordering and leader election.
  • On node loss, surviving replicas keep serving and a new leader is elected if needed.

Failure Modes

  • Stale reads: an async replica lags, so a reader sees data older than the last committed write.
  • Split-brain: a partition lets two sides accept conflicting writes, producing divergent histories to reconcile.
  • Lost write: an async leader acknowledges, then crashes before propagating, dropping a committed write.
  • Lag cascade: a slow replica falls progressively behind under write load, widening the data-loss window.

Verification

  • Measure replication lag (p99) and alert when it exceeds the recovery-point objective (RPO).
  • Kill the leader under load; assert reads and writes resume within the failover budget with no committed-write loss.
  • Run a consistency checker (e.g. a linearizability test) against the declared model.
  • Partition the cluster and confirm the configured behavior — reject writes, or accept and reconcile.
  • Single-leader, multi-leader, or leaderless — where writes are accepted.
  • Synchronous versus asynchronous — the durability-against-latency dial.
  • Standby/Failover — replication is how a hot or warm standby stays current.
  • Database Sharding — partitions data for scale; replication copies each partition for availability.