A timeout puts an upper bound on how long a caller waits for a response. When a dependency slows, deadlocks, or dies mid-request, the caller would otherwise wait forever, holding a thread and connection while its queue grows. The timeout converts that unbounded hang into a bounded, detectable failure.

Bounding the wait is also a detection tactic: a call that overruns its timing constraint reveals a late or omitted operation, which higher layers treat as a fault to retry, fail over, or escalate.

How It Works

  • Set a deadline per call from the dependency’s observed latency — commonly a multiple of its p99, not a round guess.
  • Start a timer as the call is dispatched, and cancel it when the response arrives.
  • On expiry, abandon the wait, release the thread and connection, and return a timeout error.
  • Propagate the remaining budget to downstream calls so a chain shares one deadline instead of stacking several.

Failure Modes

  • A deadline shorter than the dependency’s real p99 aborts healthy-but-slow calls, turning latency into spurious errors and retry storms.
  • A deadline longer than the caller’s own budget lets its threads and connections exhaust before the timeout fires, so the hang still cascades.
  • A timed-out write whose server-side effect actually committed leaves client and server disagreeing on state.

Verification

  • Fault injection: delay a dependency past its deadline and assert the caller returns a timeout error within the bound and releases its thread.
  • Measure the caller’s p99 and maximum wait; the maximum should track the timeout, not the dependency’s tail.
  • Monitor timeout-error rate in production; a rising rate flags a degrading dependency before it hard-fails.
  • Deadline propagation carries one budget across a call chain, so per-hop timeouts cannot sum past the caller’s own limit.
  • Cooperative cancellation frees server-side work once the client has given up, reclaiming that capacity.

References