Why do CDNs exist when we already have fast servers?
Your origin server can be the fastest box on Earth and your users in São Paulo will still hate it. CDNs exist because the speed of light, not your CPU, is the bottleneck.
Why it exists
Picture the simplest possible web setup: one server in a Virginia data center, serving the world. Page loads great if you’re in New York. It’s fine in London. It’s noticeably slow in Sydney. It’s painful in Lagos.
You can’t fix this by buying a faster server. The Sydney user’s laptop and your Virginia server are on opposite sides of the planet, and a packet has to physically traverse fiber across an ocean. The physics floor — light through glass over the great-circle distance — is on the order of 150 ms round trip; real-world routes through real cables typically come in higher than that, before any of your code runs. Modern protocols make multiple round trips to set up a connection (TCP plus TLS), and a typical web page kicks off dozens more for its assets. Latency stacks.
A CDN solves a problem your origin physically cannot: it puts a copy of your stuff near the user, so the round trip is short. The CPU was never the bottleneck. Geography was.
Why it matters now
A large fraction of the traffic you experience as “the web” is fronted by a CDN, even when it doesn’t look like one — static assets, video, software updates, API responses, edge-rendered pages, package registry tarballs, container image layers, model weights. The exact share depends on what you measure (HTML requests, third-party requests, total bytes); the HTTP Archive Web Almanac publishes yearly numbers and they vary considerably by category, but the direction is consistent: heavy CDN involvement, especially for third-party and asset traffic.
For engineers in the AI era specifically:
- LLM API providers run their inference fleets across many regions for the same latency reason. The plumbing that gets your request to the nearest region is CDN-shaped.
- Model weights and container images are huge. Pulling a multi-gigabyte image from a single origin every deploy would saturate the link. Major registries (Docker Hub, GHCR, Hugging Face) front their downloads with CDN-style caching for that reason — the exact deployment differs by registry and I don’t have a defensible single description that covers all of them.
- “Edge functions” — running code on the CDN’s POPs instead of your origin — are the natural extension once you’ve already got a global fleet of servers.
The short answer
CDN = global fleet of caches + smart routing to the nearest one
A CDN is many servers, in many cities, each holding a copy of your content, with a system out front that steers each user to a nearby copy. The user talks to a machine 20 ms away instead of 200 ms away, and your origin only sees a trickle of cache misses.
How it works
Three pieces, working together.
1. Points of presence (POPs). A CDN operates servers in dozens to
hundreds of locations worldwide — major exchange points, ISP facilities,
metro data centers. Each POP runs cache nodes. When a user requests
https://example.com/logo.png, the goal is to serve logo.png from the
closest POP that has it.
2. Steering the user to the nearest POP. Two mechanisms dominate, often combined:
- GeoDNS.
When the user’s resolver looks up
example.com, the CDN’s DNS server hands back an IP for a nearby POP. Cheap and broadly compatible; the downside is it routes based on the user’s resolver, which isn’t always close to the user. - Anycast. Multiple POPs advertise the same IP address into the global routing table. The internet’s own routing (BGP) delivers the packet to whichever POP is “closest” by routing metric. Latency-aware in a structural way, but you don’t get to pick which POP someone lands on.
Many CDNs combine these — for example, anycast at the network edge plus DNS-based steering, and internal routing to push the request to a POP that’s warm and healthy. The exact mix varies by provider; “all CDNs use anycast” is not true.
3. Caching, with rules. When the chosen POP gets the request, it checks its cache:
- Hit — return the cached response immediately. Origin is not contacted.
- Miss — fetch from origin (or a regional parent cache), store the response, and return it. The next user in that region gets a hit.
What’s cacheable, and for how long, is controlled by HTTP cache headers plus CDN-specific config. Three different concerns, often muddled together:
- Freshness —
Cache-Control(and the olderExpires) say how long a response can be served from cache before re-checking with origin. - Validation —
ETagandLast-Modifiedare the tokens used to ask origin “is what I have still good?” without re-downloading the body. - Cache-key variation —
Varytells the cache that the right response depends on a request header (e.g.Accept-Encoding), so it must keep separate cached entries.
Static files are easy. Dynamic responses get harder, which is where features like stale-while-revalidate, surrogate keys for targeted purges, and edge-computed personalization come in.
A non-trivial part of running a CDN is the cache hierarchy itself: edge POPs in front of regional shields in front of origin, so a viral asset doesn’t stampede the origin even on cold cache.
Show the seams
A few things the simple story glosses over:
- CDNs aren’t only about static files. Modern CDNs terminate TLS, run WAFs, do DDoS scrubbing, route APIs, and execute code at the edge. The cache is the historical core, not the whole product.
- Cache invalidation is the hard part. “Push a new version of
app.js” sounds trivial; doing it consistently across hundreds of POPs in seconds is an actual distributed-systems problem. A common workaround for static assets is to put a content hash in the filename (app.abc123.js) and treat the URL as immutable — change the URL instead of invalidating the cache. - GeoDNS sees the resolver, not the user. If a user in Kenya is using a public resolver hosted in Europe, GeoDNS will route them to a European POP. EDNS Client Subnet (RFC 7871) helps by passing a portion of the client’s subnet to the authoritative server, but it isn’t universally deployed and has its own privacy tradeoffs that not every resolver wants to make.
- Anycast is a routing artifact, not a guarantee. A user can land on a POP that is geographically far but topologically near (or just whatever BGP decided this hour). It usually works; when it doesn’t, debugging is hard because the routing isn’t yours.
- Cache hit ratio is the whole game. A CDN with a 50% hit ratio is doing half its job — every miss pays the full origin round trip plus the user- to-POP hop. The hit ratio depends on how cacheable your content is and how well the cache key is tuned. I don’t have a defensible single industry number for “typical” hit ratios — it varies wildly by workload — but it’s the metric operators actually watch.
- CDNs are a centralization story. A small number of providers handle the front-door traffic for a sizeable share of popular websites — I’d rather not pin a specific percentage here, the available measurements vary by methodology — but the practical evidence is clear: when one big CDN has a bad config push, a noticeable fraction of the internet appears to break at once. That’s not a bug in the CDN model; it’s a structural consequence of it.
Famous related terms
- POP (Point of Presence) —
POP = a CDN's local server cluster + the network links into it— the unit a CDN is built out of. - Anycast —
anycast ≈ "same IP advertised from many places, the network picks one"— how packets find a nearby POP without DNS being involved. - GeoDNS —
GeoDNS = DNS + per-resolver answers— older but still common steering mechanism. - Edge function —
edge function ≈ "your code, but running in the POP"— what you get once the CDN already has compute everywhere. - Origin shield —
origin shield = an extra cache tier in front of your origin— absorbs cache misses so origin doesn’t get hammered on cold cache.
Going deeper
- Karger et al., Consistent Hashing and Random Trees (STOC 1997) — the paper on distributed caching that came out of MIT just before Akamai was founded. The academic seed of the modern CDN.
- Cloudflare’s and Fastly’s engineering blogs are unusually open about how their networks actually work, including post-mortems when things go wrong.
- RFC 9111 (HTTP Caching) — the normative rules for what
Cache-Control,Age, and friends actually mean. Drier than the blog posts but the source of truth when caches behave surprisingly.