Feature Toggles

Feature toggles (also known as feature flags) allow teams to modify system behavior at runtime without changing or redeploying code. By wrapping new logic in a conditional check, developers can ship “dark” code to production and enable it only when ready.

How It Works

Every toggle is a named boolean (or multi-variant) condition evaluated at runtime. The application checks the toggle store and takes the appropriate code path. Toggle state is managed outside the codebase — enabling or disabling a feature requires no redeployment.

Toggle Check: A request arrives. The application checks the toggle store for the named flag, passing the current user context if targeting rules (e.g., country, user ID) apply.
Branching: If the flag is ON, the new code path executes. If OFF, the existing path runs. The caller sees no difference in the interface.
Runtime Control: An operator can flip the flag in the toggle store at any time to enable a feature, start an experiment, or kill a misbehaving service.

Toggle lifecycle: A toggle moves through four stages.

Create: Wrap new code behind a toggle; default OFF in production.
Validate: Enable for internal users, then a small canary percentage.
Roll out: Increase rollout percentage; monitor error rates and business metrics.
Clean up: Once fully released and stable, remove the toggle and dead code path to avoid technical debt. Set a calendar reminder or use a “stale flag” alert when the toggle is created.

Toggle categories: Toggles differ by lifespan and ownership.

Category	Lifespan	Who controls?	Example
Release toggle	Days to weeks	Engineering	Enable incomplete feature on main branch
Experiment toggle	Weeks	Product / Data	A/B test a UI variant
Ops toggle	Hours to days	Operations	Kill switch for a misbehaving service
Permission toggle	Long-lived	Product	Beta access for paying customers

Failure Modes

Toggle debt: Toggles that are never cleaned up multiply over time, creating a combinatorial explosion of code paths that is impossible to test or reason about.
Stale toggles in tests: Tests that hard-code toggle states can become misleading, where new code paths are never exercised or old paths are never removed from the suite.
Evaluation Latency: Frequent checks against a remote toggle store (e.g., an external SaaS) can significantly increase request latency if not cached locally with a short TTL.
Inconsistent evaluation: Toggle state evaluated multiple times in a single request (e.g. once in the UI, once in the API) may differ if the store changes mid-request, causing incoherent behavior.
Configuration drift: Toggle state in staging diverges from production; a feature passes QA but breaks in production because the default flag values differ.
Toggle Collision: Two independent toggles affecting the same code area create unexpected side effects when enabled together (e.g., Toggle A changes the UI layout, Toggle B changes the data format).

Verification

Automated Inventory: Keep a registry of all active toggles with category and expected expiry date; fail the build or send an alert when a release, experiment, or ops toggle outlives its maximum lifespan (e.g., 90 days) — permission toggles are exempt but reviewed on a schedule.
Dual-Path Testing: For critical toggles, run the automated test suite with the toggle both ON and OFF in CI to ensure no regressions in either path.
Canary Monitoring: After each increment of a percentage rollout, monitor error rates, p95 latency, and key business metrics (e.g., conversion) for a defined period (e.g., 30 minutes) before proceeding.
Kill-switch Drill: Periodically verify that an ops toggle can be flipped to OFF and the change propagates across the system within the SLA recovery window (e.g., ≤ 5 minutes) without a code deployment.

Dark launching: The new code path executes in production (often reading data or making calls) but its output is discarded—used to validate performance and correctness before exposure.
Percentage rollout: The toggle is ON for a configurable fraction of users (e.g. 5 %, then 25 %, then 100 %), allowing gradual exposure and early detection of issues at scale.
User-segment targeting: Toggles scoped to specific user attributes (country, plan, cohort) for localized releases and targeted experiments.
Branch by Abstraction: A technique for making large-scale changes by introducing an abstraction layer that can toggle between old and new implementations.

References

Feature Toggles — Pete Hodgson (full citation)
Trunk Based Development

Feature Toggles

Intent

Mechanism

Applicability

How It Works

Failure Modes

Verification

References

Supported Qualities

Trade-offs

Related Requirements

Intent

Mechanism

Applicability

How It Works

Failure Modes

Verification

Variants and Related Tactics

References

Supported Qualities

Trade-offs

Related Requirements