Why is DNS hierarchical?
DNS could have been a giant flat lookup table — one machine somewhere mapping every name in the world to an IP. It isn't, and the reason is less about technology than about who gets to be in charge of what.
Why it exists
In the early 1980s the entire internet’s name-to-address mapping lived in a
single text file called HOSTS.TXT, maintained by the
SRI-NIC
and copied around by every host that wanted to know what mit-multics resolved
to. This worked when “the internet” was a few hundred machines run by people
who knew each other. It stopped working the moment it didn’t.
Three things broke at once:
- Name collisions. Only one machine in the world could be called
vax. With one global file, every new host had to ask a central authority before picking a name. - Update lag. Every host pulled
HOSTS.TXTperiodically. Adding a machine meant waiting for the file to propagate, and the file kept getting bigger. - A single point of administrative failure. SRI-NIC was the bottleneck for every name change anywhere on the network. Worse, it was the wrong group of humans to be deciding what a university lab in Berlin should call its server.
DNS was the redesign. The technical pieces — caching, UDP queries, resource records — are interesting, but the load-bearing idea is the shape of the namespace. It’s a tree. And the tree exists primarily so that authority can be delegated.
Why it matters now
Every modern system that has to name things at scale ends up with the same shape: Kubernetes namespaces, Java packages, npm scopes, AWS resource ARNs, S3 bucket subpaths, even file systems. The reason is the same reason DNS is hierarchical, and once you see it once you see it everywhere: hierarchy is how you split naming from control without losing either.
For an engineer this matters in concrete ways. When you register
example.com, you’re not buying a row in some giant database — you’re being
delegated authority over the entire subtree below example.com. Nobody at the
root needs to know or approve when you create api.staging.example.com. That
delegation is also what lets DNS scale, what lets caching work, and what
determines whose key signs what under
DNSSEC.
The short answer
DNS = tree of names + delegation of authority at every edge + caching
DNS is hierarchical because the names are the cheap part. The expensive part
is deciding who’s allowed to say what something.com resolves to, and the tree
is the data structure that lets that authority be handed off cleanly at every
level. Caching then makes the whole thing fast enough to be invisible.
How it works
Read a domain name right-to-left. mail.example.com is really
com → example → mail, with an invisible root . at the very end. Each dot
is a hand-off point.
The root is tiny on purpose. The root zone — the contents of . — only
needs to know one thing per
TLD:
where to find the name servers for com, for org, for uk, and so on.
That’s a small list. It’s served by the
13 root server letters,
each of which is actually many machines spread around the world via
anycast.
The root almost never changes, so this works.
Each TLD operator runs its own zone. The .com operator (Verisign) doesn’t
have to know your domain’s IP — it only has to know which name servers you’ve
declared as authoritative. When a resolver asks .com “where is
example.com?”, it gets back a referral: “I don’t know, but ns1.example.com
does.”
You run the leaves. Or your DNS provider does. Either way, the
authoritative servers for example.com are the only place in the world where
the answer for mail.example.com actually lives. Nobody upstream stores it.
That’s delegation: each level only knows enough to point further down.
A typical resolution looks like this:
client → resolver: "mail.example.com?"
resolver → root: "mail.example.com?"
root → resolver: "ask the .com servers, here they are"
resolver → .com: "mail.example.com?"
.com → resolver: "ask example.com's nameservers, here they are"
resolver → example.com NS: "mail.example.com?"
example.com NS → resolver: "203.0.113.42"
resolver → client: "203.0.113.42"
In practice the resolver did this once, weeks ago, and has been serving the answer from cache for everyone in your office since. Each record carries a TTL which is the operator’s promise: “this answer is good for at least N seconds.”
Why a tree, not a hash table?
You could imagine an alternative universe with a flat namespace and a
distributed hash table — example.com is just a key, and some peer-to-peer
protocol stores the value. People have built this; it doesn’t replace DNS, and
the reason is mostly social, not technical.
A flat namespace forces a single answer to “who decides which names are
allowed?” A tree lets that question be answered differently in every subtree.
ICANN decides which TLDs exist. Verisign decides who gets a .com. You decide
what lives under your .com. Your team lead decides what lives under
team.example.com. The same data structure that organizes the names also
organizes the politics, and that turns out to be the thing you can’t skip.
Show the seams
A few places the clean story doesn’t quite hold:
- The root isn’t really the root. ICANN coordinates the root zone, but the content (which TLDs exist and who runs them) is the result of policy processes, contracts, and in some cases government involvement for country-code TLDs. The “tree” has a political base, not a technical one.
- Caching makes propagation messy. “DNS changes can take 24–48 hours to propagate” is shorthand for “every cache between you and the world has its own TTL countdown.” Lower TTLs trade load on your servers for faster changes. There isn’t a clean answer; it’s a tuning knob.
- The “13 root servers” number is a historical artifact. It’s 13 named identities (A-root through M-root), originally because of UDP packet size limits in the early DNS protocol. Each identity is now backed by hundreds of physical instances via anycast.
- DNSSEC is the part that makes the hierarchy cryptographic, not just administrative. Each zone signs its records, and the parent zone signs a hash of the child’s key. The chain of trust runs from the root key downward, mirroring the delegation tree. Adoption is uneven and I don’t have a current global number I’d defend — coverage varies a lot by TLD and by organization.
- Modern resolution often skips most of this. Your laptop usually asks a
recursive resolver (your ISP’s, or
1.1.1.1, or8.8.8.8) which has cached half the internet. The full root-to-leaf walk is rare in practice but is the fallback the system is designed around.
Famous related terms
- TLD —
TLD = the rightmost label in a domain name—.com,.org,.uk. The first level of delegation under the root. - Authoritative vs recursive server —
authoritative = "I own this zone",recursive = "I'll go ask on your behalf and cache the answer"— the two roles every DNS deployment splits into. - TTL —
TTL = seconds a record may be cached— the operator’s contract with every resolver downstream. - Anycast —
anycast = same IP, many machines, routing picks the nearest— how a “single” root server is actually hundreds. - DNSSEC —
DNSSEC = DNS records + signatures rooted at the root key— turns the administrative tree into a cryptographic one.
Going deeper
- RFC 1034 and RFC 1035 — Paul Mockapetris’s original DNS specifications from 1987. Short, readable, and the why shows through more clearly than in any later document.
- “The Design of the Domain Name System” (Mockapetris and Dunlap, SIGCOMM 1988) — the design rationale paper, which is more explicit than the RFCs about why hierarchy was the central choice.
- The IANA root zone database — the actual list of TLDs and their operators, which makes the political layer concrete in a way the protocol docs don’t.