Why retry with exponential backoff — and why jitter?
Retrying on failure sounds simple until you ship it at scale. Hammer the server and you make outages worse; back off but synchronize, and you accidentally rebuild the herd. Backoff is the timing rule; jitter is the part that keeps it from biting itself.
Why it exists
You call an API. It returns 503, or times out, or your TCP connection resets. The naive thing is to retry immediately. Do that in a loop and you’ve built a denial-of-service attack against a service that was already struggling. You make the outage worse. You arrive at the bad moment with friends.
So engineers learned a softer rule: wait, then retry, and wait longer each time you fail. This is exponential backoff — the wait doubles (or grows by some constant factor) on each failure. The intuition is that you don’t know how long the bad condition will last, and exponential growth covers a huge range of timescales — milliseconds to minutes — without you having to guess in advance.
That’s half the answer. The other half is the part that surprises people the first time they meet it: if every client retries on the same exponential schedule, you’ve replaced a stampede with a metronome. All the clients that failed at second 0 retry at second 1. The ones that fail there retry at second 3. The ones that fail there retry at second 7. The server sees neat periodic spikes that, if anything, are easier to overload than random arrivals would be — because every spike is a near-simultaneous burst.
The fix is jitter — randomization on the wait. Each client picks its retry time from a window, not a point. The thundering herd flattens into something the server can actually serve.
Why it matters now
If you ship anything that talks to an external service in 2026, you ship backoff. The list of places it shows up has only grown:
- Cloud SDKs. AWS, Google Cloud, and Azure clients commonly retry transient errors with backoff and jitter, but the defaults are SDK-specific (which errors qualify, what the cap is, what shape of jitter) and not always documented in the place you’d expect. Read the one you actually use.
- LLM provider APIs. Rate limits and 5xx are routine, especially during peak hours. Naive retry loops in agent code can trip the provider’s abuse limits and earn you a longer cooldown than the original error caused. Some official client libraries retry with backoff out of the box; many home-grown wrappers don’t.
- Webhook delivery. Stripe automatically retries failed webhook deliveries on an exponential schedule for up to three days. GitHub, by contrast, does not auto-redeliver — it surfaces failed deliveries and leaves the redelivery to you. The pattern shows up across webhook senders, but the policies vary; check the docs of the one you receive from.
- Distributed databases and queues. Leader elections, replica catch-up, and client connection retries use backoff. Without it, one node’s reboot can turn into a cluster-wide reconnect storm.
- TCP itself. TCP retransmits unacknowledged segments after a RTO; RFC 6298 specifies that on each retransmission-timer expiry the RTO doubles, with an implementation-defined upper bound. The pattern is so load-bearing the lower layers ship it whether you asked or not.
- Browsers and other clients. Many user-agents back off retries on DNS failures and on certain HTTP errors, though the exact policy varies by browser and by error class.
The cost of getting it wrong is no longer “my script is annoying.” It’s “my agent fleet just synchronized into a wave that took down the upstream service for everyone.”
The short answer
retry policy = exponential backoff + jitter + a stopping rule
Three pieces. Exponential backoff spaces the retries out so a slow or flapping service has time to recover. Jitter breaks the synchronization between independent clients so their retries don’t pile up at the same instant. A stopping rule — max retries, max total wait, or both — keeps the loop from becoming an infinite background apology. Drop any of the three and you’ve built a footgun.
How it works
1. The exponential part
Pick a base delay (say 100 ms) and a factor (usually 2). The wait before
attempt n is base × factor^(n-1), capped at some ceiling so it doesn’t
drift into “retry next Tuesday” territory:
attempt 1: fail, wait 100 ms
attempt 2: fail, wait 200 ms
attempt 3: fail, wait 400 ms
attempt 4: fail, wait 800 ms
...
attempt k: fail, wait min(base × 2^(k-1), cap)
Why exponential and not linear? Linear backoff tends to spend most of the retry budget on small early waits and never reaches an interval long enough to ride out a real multi-second outage. Exponential covers many orders of magnitude with few retries: six attempts at base 100 ms reaches several seconds; ten reaches a minute or two. That’s the right shape for “I don’t know if this is a 50 ms blip or a 30 second deploy.”
2. The jitter part — and why “full jitter” is the surprising winner
Plain backoff schedules every client onto the same ladder. Add jitter and each client picks a random delay from a window. The interesting question is which window.
The variants Marc Brooker used in the canonical AWS Architecture Blog post on this (March 4, 2015, “Exponential Backoff And Jitter”):
- No jitter —
wait = base × 2^(n-1). Simple, synchronizes clients, thundering-herd-prone. Don’t ship this. - Equal Jitter — half the wait is deterministic, half is random inside the current exponential window.
- Full Jitter —
wait = random(0, base × 2^(n-1)). Each retry lands uniformly somewhere inside the current exponential interval. - Decorrelated Jitter — the next wait is randomized within a window that grows from the previous wait, not from a fixed exponential schedule.
Brooker’s simulations on a synthetic workload found Full Jitter reduced server load substantially compared to no jitter, and that Decorrelated Jitter finished the work in slightly fewer total calls than Full Jitter on his benchmark. The headline result — “spread the retries out randomly across the exponential window” — is the part that’s stuck. The post nudged the broader ecosystem toward jittered retries; whether any specific AWS SDK uses Full vs Decorrelated today varies by SDK and version, and I won’t claim more than that without checking the source.
The shape worth memorizing: exponential window, uniform random within it. Pick a variant, document it, and move on.
3. The stopping rule
Backoff without a stopping rule is a quiet disaster. Two common shapes:
- Bounded retries — give up after N attempts, surface the error to the caller. Right for interactive paths where someone is waiting.
- Bounded total time — keep retrying until a deadline (e.g. the caller’s overall timeout) passes. Right for background work where the job has a wall-clock budget anyway.
Both have a subtler companion: the per-attempt timeout has to be shorter than the deadline. A retry loop where each attempt blocks for 30 seconds with a 30-second total budget gives you exactly one try. The classic “we have retries, why didn’t they help?” postmortem ends here more often than it should.
4. The seams nobody puts in the diagram
- Don’t retry non-idempotent writes blindly. A
POST /chargesthat times out may have succeeded; retrying creates a duplicate charge. Use idempotency keys, or restrict retries to GETs and explicitly safe endpoints. The HTTP method is a hint, not a guarantee. - Honor
Retry-After. When a server returns 429 or 503 with aRetry-Afterheader, that value is the server telling you exactly how long to wait. Your local backoff math should defer to it, not fight it. - Circuit breakers complement backoff. Backoff handles a single call. A circuit breaker handles the aggregate — when failure rate crosses a threshold, stop trying for a window. Without one, every caller in your fleet keeps generously retrying a service that is plainly down.
- Retry budgets stop runaway amplification. A 3-retry policy turns one failure into four requests. Cascade that across three services and you’ve quietly made each user-facing failure cost 64 backend requests. Some systems cap retries at the fleet level (e.g. “no more than 10% of total traffic may be retries”) to bound this.
- Jitter doesn’t help one client. A single client retrying a single
call gains nothing from jitter; the herd-flattening effect only shows
up across many independent clients. The cost of adding it is small —
one extra
random()per retry — and the moment your one client becomes ten, you’re glad you did.
Famous related terms
- Idempotency —
idempotent op = same result whether you run it once or N times— the property that makes retrying a write safe. Without it, retries are guesses about a write you may have already done. - Circuit breaker —
circuit breaker = error counter + cooldown window— short-circuits calls to a downstream that’s clearly down, so you stop wasting capacity on doomed retries. - Token bucket / leaky bucket —
token bucket = bucket of N tokens + refill rate— a rate-limiting shape on the server side that backoff is reacting to. Knowing the bucket helps you size the backoff. - Thundering herd —
thundering herd = many clients waking at once + one shared resource— the failure mode jitter is built to kill. - TCP congestion control — backoff’s older cousin. TCP halves its sending rate on loss and grows it back slowly; same instinct, different layer.
- Hedged requests —
hedged request = send the call + send a duplicate after a small delay if the first hasn't returned— the opposite of backoff for tail-latency-sensitive reads. They live together in the same toolbox.
Going deeper
- Marc Brooker, “Exponential Backoff And Jitter” (AWS Architecture Blog, 2015). The article that popularized full jitter; the simulation plots are worth the read.
- Google’s SRE Book, chapter on “Handling Overload,” for the failure modes that retry-without-budget produces in real fleets.
- RFC 9110 §10.2.3 (
Retry-Afterheader field) and §15.6.4 (503 Service Unavailable), plus RFC 6585 §4 (429 Too Many Requests) — the small but load-bearing protocol pieces backoff loops should respect. - Your favorite cloud SDK’s retry source. Reading one concrete
implementation end-to-end (factor, cap, jitter shape, stopping rule,
Retry-Afterhandling) is the fastest way to make the abstract picture stick.