Heads up: posts on this site are drafted by Claude and fact-checked by Codex. Both can still get things wrong — read with care and verify anything load-bearing before relying on it.
why how

Why JSON beat XML

XML had a standards body, schemas, namespaces, transformations, and a decade head start. JSON had curly braces and a JavaScript parser. Curly braces won.

Computer Science intro Apr 29, 2026

Why it exists

It’s the early 2000s. You want to send structured data between two programs over HTTP. The official answer, blessed by the W3C, is XML. There are entire books about it. There are degrees in it. Enterprise vendors have built tooling empires around it: XSD for schemas, XSLT for transformations, XPath for queries, SOAP for RPC. Microsoft, IBM, Oracle, Sun all agree XML is the way.

So why, twenty years later, does basically every web API you touch return {"id": 42, "name": "phi"} instead of <user><id>42</id><name>phi</name></user>?

The short version: XML was designed to mark up documents, but engineers needed to ship data structures. JSON happened to be a literal serialization of the data structure JavaScript already used, so the browser could parse it for free. Everything else followed from that one fact.

Why it matters now

Almost every API a software engineer touches today speaks JSON: REST endpoints, GraphQL responses, log lines, config files for tools that swore they’d stay YAML-only, the request and response bodies of every LLM API, the tool-call schemas inside those LLM requests. When you debug a production incident in 2026, you’re reading JSON.

Understanding why it won — and what it gave up to win — explains a lot of weird corners: why JSON has no comments, why dates are strings, why every API has its own ad-hoc convention for null vs. missing, and why tool-call payloads still wrestle with the same problems SOAP wrestled with in 2003.

The short answer

JSON = JavaScript object literal syntax + "that's the whole spec"

JSON is the subset of JavaScript syntax you’d use to write a nested object literal — strings, numbers, booleans, null, arrays, objects — frozen and called a data format. It won because it was small enough to fit in your head, parseable by eval() in any browser shipped after 1996, and shaped exactly like the data structures programmers already had in memory.

How it works (and how XML works differently)

Take the same payload in both:

<user id="42">
  <name>phi</name>
  <admin>true</admin>
  <tags>
    <tag>ops</tag>
    <tag>writer</tag>
  </tags>
</user>
{
  "id": 42,
  "name": "phi",
  "admin": true,
  "tags": ["ops", "writer"]
}

A few things jump out. The XML version is longer. But more importantly, the XML version doesn’t actually tell you whether id is a number or a string — everything in XML is text. You need an external schema to know. It also blurs attributes (id="42") and child elements (<name>phi</name>), which are syntactically different but semantically often the same; every team has to pick a convention. And the list-of-tags case has no native syntax: you wrap repeating elements in a parent and hope the consumer knows it’s a list of one.

JSON sidesteps every one of these. Numbers are numbers. Booleans are booleans. Lists have brackets. There’s exactly one way to express “an object with these fields.” A parser written from the spec fits in maybe 200 lines of C.

This is the deep reason XML lost: XML was a markup language pretending to be a data format. Markup is the right tool when you have prose with annotations sprinkled in (<p>This is <em>important</em>.</p>). It’s the wrong tool when you have a record with five typed fields, because it makes you encode the type information out-of-band and the structure verbosely.

The seams JSON left exposed

JSON winning didn’t make the underlying problems disappear. It just pushed them into convention.

The standard account is that Douglas Crockford specified JSON in the early 2000s — he registered the application/json media type and ran json.org — but he’s been clear he discovered it rather than invented it. The syntax was already implicit in JavaScript. He just wrote it down and gave it a name. RFC 8259 is the current spec; it’s about ten pages of actual content. XML 1.0 plus the Namespaces, Schema, and XPath specs run into the hundreds.

I don’t have a clean source for exactly when JSON volume overtook XML volume on the public web, and I’d be skeptical of any single number on it — adoption was gradual, format-by-format, between roughly 2006 (when major sites started offering JSON endpoints alongside XML) and the early 2010s (when REST-plus-JSON became the default new-API choice). The shift from SOAP to REST, and from desktop apps to single-page JavaScript apps, dragged the data format with it.

Going deeper