Why circuit breakers exist
Backoff makes a single retry polite. But when a downstream is plainly down, every caller in your fleet generously retrying it is the actual problem. A circuit breaker is the small piece that says: stop calling for a while — the answer isn't going to change in the next 50 ms.
Why it exists
You’ve already shipped retries with backoff and jitter. A downstream service gets sick. Each of your callers fails, waits, retries, fails again, waits longer, retries. Polite. Reasonable.
Now multiply by ten thousand processes. The “sick” service is being held underwater by a tide of retries from a fleet that was carefully told to be nice about it. Each individual caller is well-behaved; the aggregate is a sustained denial-of-service against a service that’s trying to recover. The slower the downstream gets, the more requests pile up in the caller’s threadpool or event loop, and the slower the caller gets — which often takes the caller down with it. This failure shape has a name: cascading failure.
Backoff is local. Each call decides independently when to try again. There is no caller in the system whose job it is to notice that the answer is clearly “down” and stop asking on behalf of the rest. That’s the gap a circuit breaker fills. It sits in front of the call site, watches the recent error rate, and when failure crosses a threshold, it stops making the call at all for a window. Requests in that window fail fast — locally, without touching the network — so the downstream gets quiet air and the caller doesn’t pile up work it can’t finish.
The metaphor is the household kind: an electrical circuit breaker doesn’t fix a short circuit. It refuses to keep delivering current into one, which is what stops the wiring from catching fire. Same instinct here: when something downstream is wrong, stop pushing into it.
Why it matters now
Anywhere a service has many callers and at least one downstream that can be slow, breakers show up:
- Service meshes. Envoy splits the role across two features.
Circuit breaking
enforces cluster-level limits (max connections, pending requests,
requests, retries). Outlier detection
ejects individual unhealthy hosts. Together they play the breaker role
at the proxy. Istio exposes both via
connectionPoolandoutlierDetection; Linkerd has its own endpoint-level failure-accrual model — same goal, not the same knobs. - RPC frameworks. gRPC’s service config supports either a
retryPolicyor ahedgingPolicyper method (one or the other, not both), and in-house RPC stacks frequently add a breaker layer on top of that. The public AWS SDKs ship retries with adaptive token-bucket throttling rather than a generic breaker; AWS’s own Builders Library argues that circuit breakers are tricky to tune and prefers token-bucket limits at the call site for many of the same goals. - Browser-side fetch wrappers and mobile clients. When the network is flaky, a breaker on the device prevents UI threads from queuing dozens of doomed retries while the user keeps tapping. The pattern is common enough; the name “circuit breaker” is less common in client code than in server code.
- LLM and agent code. An agent that calls a model provider in a loop can, on a bad afternoon, burn through quota retrying 5xx for a model that’s plainly degraded. A breaker around the provider call gives the loop a way to fail fast and take the alternate path (a smaller model, a cached answer, a graceful “try again later”) instead of stalling.
- Database connection pools. Several mature pool libraries (HikariCP is the canonical Java example) include a fast-fail mode that short- circuits acquisitions while the DB is unreachable. Same instinct as a breaker, scoped to the pool: without it, a brief DB blip hangs every request thread and the web tier dies before the DB recovers.
The cost of getting it wrong is the cost of the cascading-failure postmortem: you discover that the primary service didn’t really go down — its dependency did, and your service held the door open for the fire to walk through.
The short answer
circuit breaker = error counter + state machine + cooldown window
A breaker watches the recent results of a call. When errors cross a threshold, it opens — meaning subsequent calls fail immediately without going to the network. After a cooldown, it goes half-open, letting a small probe through. If the probe succeeds, it closes and normal traffic resumes; if it fails, the cooldown starts again. Three states, one counter, one timer. The whole pattern.
How it works
The three states
CLOSED ── error rate exceeds threshold ──▶ OPEN
▲ │
│ │ cooldown elapses
│ ▼
└──── probe succeeds ──── HALF_OPEN ◀──────┘
│
└── probe fails ──▶ OPEN
- Closed — the normal state. Calls pass through. The breaker tracks outcomes (success / failure / timeout) over a recent window.
- Open — calls short-circuit immediately with a local error. Nothing hits the downstream. A timer counts down the cooldown.
- Half-open — a single (or small number of) request(s) is allowed through as a probe. Other concurrent calls keep failing fast. The result of the probe decides whether to close or re-open.
The half-open state is the part most homemade implementations get wrong. Without it, you either flap (open, cooldown, closed, instant flood, re-open) or stay open too long (open, cooldown, closed, but you have no evidence the downstream is actually back). Letting one request through and gating the verdict on it is the cheapest experiment that answers the question.
What “error” means
The breaker only works if it counts the right things. Two calibrations matter:
- Which errors count. A 500 from the downstream counts. A timeout counts. A 404 for a thing the user asked about almost certainly does not — it’s a successful answer to a question. Lumping 4xx in with 5xx is the most common miscalibration; the breaker opens on a hot product page that returns a lot of 404s and takes the whole feature down.
- How “rate” is measured. Two shapes are common: a rolling window (“more than 50% of the last 100 calls failed”) or a consecutive-failure counter (“five failures in a row”). The window form is more robust to bursty workloads; the counter form is simpler and cheaper. Pick one, document the threshold, and make sure there’s a minimum sample size — opening because three out of three calls failed when the service had three calls all minute is how you produce a self-inflicted outage.
Where it sits
A breaker is a per-destination object, not a per-call one. You need one breaker per downstream you want to protect — typically per (service, endpoint) or per (service, region) tuple. Sharing one breaker across unrelated dependencies opens the door to either over-tripping (one bad backend opens the whole world) or under-tripping (a healthy backend’s traffic disguises a sick one’s failures).
In a service mesh, the breaker lives in the sidecar, keyed by upstream
cluster. In application code, it’s typically a singleton per logical
client (paymentsClient.breaker, inventoryClient.breaker).
Fallbacks: the part the diagram doesn’t show
A breaker that just throws a “circuit open” error is half a feature. The other half is what the caller does instead. Common shapes:
- Cached or stale answer. “Show the last known price; mark it as potentially stale.” Right when freshness is nice-to-have.
- Degraded mode. “Skip the recommendation widget, render the rest of the page.” Right when one feature shouldn’t take down the surface.
- Smaller / cheaper backend. “The big model is unhealthy; fall back to the small one with a flag in the response.” Right for AI calls where partial answers beat no answer.
- Surface the failure honestly. “We can’t process payments right now; please try in a minute.” Sometimes the right answer is an error — but a fast, clear one beats a 30-second timeout.
The fallback is application-specific, which is why most generic libraries make you provide it. The breaker handles the when; you handle the what.
Show the seams
- Backoff and breakers solve different problems. Backoff is “this one call should wait before trying again.” A breaker is “the system should stop trying for a window.” A correctly-built client uses both: jittered backoff for the individual call, a breaker for the aggregate of calls against one downstream. The two posts in related below are companions, not alternatives.
- Half-open requires concurrency control. Without it, the moment the breaker goes half-open, every concurrent waiter races through and you’ve recreated the thundering herd you opened the breaker to avoid. Production implementations cap half-open to one or a handful of in-flight probes.
- Breakers can mask problems. A breaker that opens on every minor blip will hide real signal from your dashboards if you’re not careful; the dashboard sees “low error rate from this service” and never asks why traffic also dropped. Track open/half-open transitions explicitly, not just downstream error rates.
- Don’t trip on cold starts. A breaker that observes the first three calls of a freshly-deployed pod and opens because the connection pool is still warming up will gate the whole pod’s traffic on a startup artifact. A “minimum throughput in the window” guard fixes this.
- Open isn’t always the safe default. For idempotent reads, failing fast is usually fine. For a write that the user has already paid the latency cost to start — say, a checkout — a circuit-open error during checkout submission may be worse for the business than one extra slow attempt. Tune by call, not by service.
Famous related terms
- Exponential backoff —
retry policy = exponential backoff + jitter + stopping rule— the per-call timing rule. Backoff and breakers compose; they don’t replace each other. - Idempotency keys —
idempotency key = client-generated unique ID + server-side dedupe table— what makes the retries that do slip past the breaker safe to reissue. - Bulkhead —
bulkhead = per-dependency concurrency limit + isolated resource pool— a sibling pattern that bounds how much of your fleet any one slow dependency can swallow. Breakers stop calls; bulkheads quarantine them. - Hedged requests —
hedged request = primary call + duplicate sent after a small delay— the opposite instinct, used for tail-latency-sensitive reads. Hedge healthy services; break tripping ones. - Load shedding —
load shedding = drop excess traffic + protect the rest— the server-side cousin. The downstream sheds when overloaded; the caller’s breaker is what notices and stops feeding it. - Outlier detection (Envoy) —
outlier detection = per-host failure tracking + ejection from the load-balancing set for a growing window— the per-instance sibling of cluster-level circuit breaking, with its own knobs (max ejection percent, base ejection time that grows with repeat ejections).
Going deeper
- Michael Nygard, Release It! — the book that popularized the circuit-breaker pattern for service-to-service calls. The original pattern catalog is still the cleanest description of the failure modes it’s meant to prevent.
- Martin Fowler’s article “CircuitBreaker” (2014) — short, well-diagrammed walk-through of the state machine.
- Envoy’s documentation on circuit breaking and outlier detection — a production implementation worth reading end-to-end to see the knobs (max connections, max pending, max requests, consecutive 5xx) that show up in real configs.
- Google’s SRE Book, “Addressing Cascading Failures” — the failure mode the breaker exists to interrupt, with the postmortem-flavored detail that makes the pattern feel necessary instead of cute.