Why JSON beat XML

XML had a standards body, schemas, namespaces, transformations, and a decade head start. JSON had curly braces and a JavaScript parser. Curly braces won.

Computer Science intro Apr 29, 2026

Why it exists

It’s the early 2000s. You want to send structured data between two programs over HTTP. The official answer, blessed by the W3C, is XML. There are entire books about it. There are degrees in it. Enterprise vendors have built tooling empires around it: XSD for schemas, XSLT for transformations, XPath for queries, SOAP for RPC. Microsoft, IBM, Oracle, Sun all agree XML is the way.

So why, twenty years later, does basically every web API you touch return {"id": 42, "name": "phi"} instead of <user><id>42</id><name>phi</name></user>?

The short version: XML was designed to mark up documents, but engineers needed to ship data structures. JSON happened to be a literal serialization of the data structure JavaScript already used, so the browser could parse it for free. Everything else followed from that one fact.

Why it matters now

Almost every API a software engineer touches today speaks JSON: REST endpoints, GraphQL responses, log lines, config files for tools that swore they’d stay YAML-only, the request and response bodies of every LLM API, the tool-call schemas inside those LLM requests. When you debug a production incident in 2026, you’re reading JSON.

Understanding why it won — and what it gave up to win — explains a lot of weird corners: why JSON has no comments, why dates are strings, why every API has its own ad-hoc convention for null vs. missing, and why tool-call payloads still wrestle with the same problems SOAP wrestled with in 2003.

The short answer

JSON = JavaScript object literal syntax + "that's the whole spec"

JSON is the subset of JavaScript syntax you’d use to write a nested object literal — strings, numbers, booleans, null, arrays, objects — frozen and called a data format. It won because it was small enough to fit in your head, parseable by eval() in any browser shipped after 1996, and shaped exactly like the data structures programmers already had in memory.

How it works (and how XML works differently)

Take the same payload in both:

<user id="42">
  <name>phi</name>
  <admin>true</admin>
  <tags>
    <tag>ops</tag>
    <tag>writer</tag>
  </tags>
</user>

{
  "id": 42,
  "name": "phi",
  "admin": true,
  "tags": ["ops", "writer"]
}

A few things jump out. The XML version is longer. But more importantly, the XML version doesn’t actually tell you whether id is a number or a string — everything in XML is text. You need an external schema to know. It also blurs attributes (id="42") and child elements (<name>phi</name>), which are syntactically different but semantically often the same; every team has to pick a convention. And the list-of-tags case has no native syntax: you wrap repeating elements in a parent and hope the consumer knows it’s a list of one.

JSON sidesteps every one of these. Numbers are numbers. Booleans are booleans. Lists have brackets. There’s exactly one way to express “an object with these fields.” A parser written from the spec fits in maybe 200 lines of C.

This is the deep reason XML lost: XML was a markup language pretending to be a data format. Markup is the right tool when you have prose with annotations sprinkled in (<p>This is <em>important</em>.</p>). It’s the wrong tool when you have a record with five typed fields, because it makes you encode the type information out-of-band and the structure verbosely.

The seams JSON left exposed

JSON winning didn’t make the underlying problems disappear. It just pushed them into convention.

No schema. JSON itself has no types beyond the primitives. Every API reinvents the wheel: JSON Schema, OpenAPI, Protobuf-as-JSON, ad-hoc TypeScript types. XML had XSD baked in. We arguably have more schema fragmentation now than we did then.
No dates. JSON has no date type, so dates are strings — usually ISO 8601, but not always. Time zones are a footgun.
No comments. Douglas Crockford, who specified JSON, removed comments deliberately so people wouldn’t smuggle parsing directives into them. Config files have suffered for this ever since, which is half of why YAML and JSON5 exist.
Number weirdness. JSON numbers are arbitrary-precision in the spec. JavaScript numbers are IEEE 754 doubles. So a 64-bit integer ID round-tripped through a browser silently loses precision past 2^53. Every modern API that uses big IDs ships them as strings to dodge this.
No binary. Want to send bytes? Base64 them into a string and pay the 33% overhead, or use a different format entirely.

The standard account is that Douglas Crockford specified JSON in the early 2000s — he registered the application/json media type and ran json.org — but he’s been clear he discovered it rather than invented it. The syntax was already implicit in JavaScript. He just wrote it down and gave it a name. RFC 8259 is the current spec; it’s about ten pages of actual content. XML 1.0 plus the Namespaces, Schema, and XPath specs run into the hundreds.

I don’t have a clean source for exactly when JSON volume overtook XML volume on the public web, and I’d be skeptical of any single number on it — adoption was gradual, format-by-format, between roughly 2006 (when major sites started offering JSON endpoints alongside XML) and the early 2010s (when REST-plus-JSON became the default new-API choice). The shift from SOAP to REST, and from desktop apps to single-page JavaScript apps, dragged the data format with it.

XML — XML = SGML simplified for the web + namespaces + schema layer — still dominant in document-shaped domains: SVG, Office files, RSS, legacy enterprise integration. Not dead, just specialized.
YAML — YAML = JSON superset + significant whitespace + comments + anchors — what JSON should have been for config, with all the indentation pain that implies.
Protobuf — protobuf = schema-first binary format + generated code per language — what you reach for when JSON’s text overhead and lack of typing finally hurts enough.
JSON Schema — JSON Schema = JSON document that validates other JSON documents — the schema layer JSON didn’t ship with, retrofitted later. The thing OpenAPI and LLM tool-call specs both lean on.

Going deeper

RFC 8259 — the current JSON specification. Short and worth reading once.
Douglas Crockford’s JSON: The Fat-Free Alternative to XML (2006) — the contemporary argument for why this trade-off was worth it.
Tim Bray’s blog posts from the mid-2000s — Bray co-edited the XML 1.0 spec and later wrote candidly about where XML was misused. I’m relying on the general shape of his commentary here rather than a specific quote.