How It Works
Every toggle is a named boolean (or multi-variant) condition evaluated at runtime. The application checks the toggle store and takes the appropriate code path. Toggle state is managed outside the codebase — enabling or disabling a feature requires no redeployment.
- A request arrives. The application checks the toggle store for the named flag, passing the current user context if targeting rules apply.
- If the flag is ON, the new code path executes. If OFF, the existing path runs. The caller sees no difference in the interface.
- An operator can flip the flag in the toggle store at any time — no code change, no deployment required.
Toggle Lifecycle
- Create: Wrap new code behind a toggle; default OFF in production.
- Validate: Enable for internal users, then a small canary percentage.
- Roll out: Increase rollout percentage; monitor error rates and business metrics.
- Clean up: Once fully released and stable, remove the toggle and dead code path. Set a calendar reminder when the toggle is created.
Toggle Categories
| Category | Lifespan | Who controls? | Example |
|---|---|---|---|
| Release toggle | Days to weeks | Engineering | Enable incomplete feature on main branch |
| Experiment toggle | Weeks | Product / data | A/B test a UI variant |
| Ops toggle | Hours to days | Operations | Kill switch for a misbehaving service |
| Permission toggle | Long-lived | Product | Beta access for paying customers |
Failure Modes
- Toggle debt: Toggles that are never cleaned up multiply over time, creating a combinatorial explosion of code paths that is hard to test and reason about.
- Stale toggles in tests: Tests that hard-code toggle states become misleading — new code paths are never exercised, or old paths never removed from the suite.
- Inconsistent evaluation: Toggle state evaluated multiple times in a request (e.g. once in the UI, once in the API) may differ if the store changes mid-request, causing incoherent behaviour.
- Configuration drift: Toggle state in staging diverges from production; a feature passes QA but breaks in production because the toggle defaults differ.
- Cascading toggles: Toggle A depends on Toggle B; enabling A without B causes an unexpected failure that is hard to debug.
Verification Ideas
- Toggle inventory: Keep a registry of all active toggles with their expected expiry dates; fail the build if any toggle is older than its maximum allowed lifespan (e.g. 90 days).
- Test matrix: For critical toggles, run the test suite with the toggle both ON and OFF in CI.
- Rollout monitoring: After each increment of a canary rollout, monitor error rate, p95 latency, and key business metrics for at least 30 minutes before proceeding.
- Kill-switch drill: Periodically verify that an ops toggle can be flipped to OFF within the SLA recovery window (e.g. ≤ 5 minutes) without a code deploy.
Variants
- Dark launching: The new code path executes in production (often reading data or making calls) but its output is discarded — used to validate correctness and performance before exposing results to users.
- Percentage rollout: The toggle is ON for a configurable fraction of users (e.g. 5 %, then 25 %, then 100 %), allowing gradual exposure and early detection of issues at scale.
- User-segment targeting: Toggles can be scoped to specific user attributes (country, plan, cohort), enabling localised releases and targeted experiments.