In a pipeline, data flows downstream and trouble flows upstream. When one stage slows, its producers keep sending at full rate: queues grow, memory and latency climb, and the failure surfaces far from the actual bottleneck. Backpressure propagation carries an overload signal against the data direction — every stage tells its upstream neighbor how much it can accept, hop by hop, until a component can slow, degrade, or reject at the source.
How It Works
- Bound every buffer between stages; a full buffer is a signal, not a failure.
- Prefer pull over push: the consumer requests n items, the producer sends at most n — the demand model of Reactive Streams and TCP receive windows.
- Translate pressure across push hops explicitly: slowed acknowledgements, a paused poll loop, or rejection with a retry-after hint.
- At the edge, convert pressure into action: pace intake, degrade fidelity by sampling or batching, or reject with an explicit error.
Failure Modes
- One unbounded queue mid-pipeline swallows the signal: upstream looks healthy while that queue eats memory toward an out-of-memory crash.
- Rejected producers that retry immediately amplify load exactly when capacity is lowest.
- When pressure reaches a source that cannot slow (sensors, user clicks), a missing shedding policy loses data at an uncontrolled point.
- Tiny buffers plus request cycles deadlock: two stages each wait for the other’s demand.
Verification
- Drive offered load past capacity: goodput plateaus at capacity and memory stays bounded — the load curve flattens, never folds.
- Halve one stage’s speed in a chaos test: producers converge to the new pace inside the agreed window, no buffer exceeds its bound.
- Alert on time-at-full per buffer; sustained pressure past its threshold pages before latency breaches the SLO.
Variants and Related Tactics
- Rate Limiting — a static admission ceiling at the edge; backpressure adjusts dynamically to live capacity.
- Limit Event Response — bounds a single stage; propagation chains the signal end to end.
- Asynchronous Messaging — supplies the queues; consumer lag and bounded topics become the pressure signal.
Example
A telemetry pipeline moves metrics from thousands of agents through a broker into a stream processor and a time-series store. During store compaction, writes slow: the processor’s bounded buffer fills, so it stops pulling from the broker; consumer lag grows, and the agents — seeing slower acknowledgements — switch to coarser sampling. The pipeline runs at the store’s pace until compaction finishes, then drains the backlog. No queue grew without bound, and nothing crashed.
References
- Release It! Design and Deploy Production-Ready Software (2nd ed.) — Michael T. Nygard, Pragmatic Bookshelf, 2018 — Back Pressure stability pattern (full citation)
- Reactive Streams Specification — the demand-based backpressure contract for asynchronous stream processing (2015)
- RFC 9293: Transmission Control Protocol — flow control via the receive window, the protocol-level original
- Site Reliability Engineering, ch. 21 “Handling Overload” — Beyer, Jones, Petoff, Murphy (eds.), O’Reilly, 2016