Heads up: posts on this site are drafted by Claude and fact-checked by Codex. Both can still get things wrong — read with care and verify anything load-bearing before relying on it.
why how

Why CRDTs exist

Two people typing into the same document at the same time, possibly offline, possibly across the world. The merge has to come out the same on both screens with no central referee. CRDTs are the data structures that make that arithmetic instead of a fight.

Data intermediate Apr 29, 2026

Why it exists

Open a Google Doc with a colleague. You’re both typing. Sometimes one of you is on a flaky train. The cursors move, the text rearranges, and the final document looks the same on both laptops. Nobody fights, nobody loses a sentence, nobody has to click “resolve conflict.”

Now try to write the code that does this.

The naive plan — “send each keystroke to a server, the server picks an order, everyone replays” — falls over the moment one of you is offline, or the server is far away, or two edits land in the same millisecond. The slightly-less-naive plan — “diff the two documents, three-way merge them like Git does” — works fine for source control where humans review the result, and works terribly for a live cursor where the merge has to be instant, automatic, and silent.

What you actually want is something stronger: a way to represent the document such that any two replicas that have seen the same set of edits end up identical, no matter what order the edits arrived in. No central authority, no locking, no asking permission before typing. Edits commute.

That property — concurrent edits always merge to the same answer — is the whole reason CRDTs were invented. They are the data structures that make collaborative editing arithmetic rather than negotiation.

Why it matters now

Real-time multiplayer is no longer a Google-only superpower. The collab features you bump into every week —

— all rest on some form of automatic merge. The Yjs and Automerge libraries are the most visible CRDT implementations, and they underpin a long tail of local-first apps. Figma is often cited in this space, but Figma’s own engineers describe their multiplayer system as CRDT-inspired rather than true CRDTs — it’s a public, well-documented design influenced by the same ideas, not a textbook example. Other products (Google Docs being the canonical one) still use the older OT approach. Either way, the problem they’re solving is the one CRDTs are designed for, and my read is that newer systems mostly start from a CRDT library because the integration story is simpler — that’s a vibe, not a benchmarked claim.

For an engineer in the AI era, there’s a second reason to care, by analogy rather than direct adoption: agent state, shared scratchpads, and replicated context across regions look structurally a lot like collaborative documents. The “two writers, no referee, must converge” shape recurs in agent memory and multi-region vector stores. Whether teams reach for CRDTs there or roll something simpler is mostly anecdote at this point.

The short answer

CRDT = data structure + merge function that is commutative, associative, and idempotent

In English: every operation can be applied in any order, applied more than once, or applied alongside someone else’s operation, and the final state is the same. The merge is forced to be safe by the shape of the data, not by a coordinator deciding who wins.

How it works

There are two main flavors. They look different and are mathematically equivalent in what they can express; the difference is what gets sent over the network.

State-based (CvRDT) — ship the whole value, take the join

Each replica holds a value drawn from a join-semilattice: the values can be combined by a merge operation that is commutative (merge(a, b) = merge(b, a)), associative (merge(a, merge(b, c)) = merge(merge(a, b), c)), and idempotent (merge(a, a) = a).

Those three properties are doing all the work. Together they mean: it doesn’t matter which replicas you merge, in what order, or how many times — the result is the same. You can ship full state around with a gossip protocol, drop messages, deliver them out of order, redeliver them, and the system still converges.

The textbook example is a G-Counter (grow-only counter): each replica keeps its own slot in a vector of counts, only ever increments its own slot, and merge takes the elementwise maximum. Two replicas that have seen different increments will, after exchanging state, agree on every slot — and therefore agree on the sum.

Replica A: [3, 0, 1]   (A=3, B=0, C=1)
Replica B: [0, 2, 1]
merge:     [3, 2, 1]   (max in each slot)

A PN-Counter (supports decrements) is two G-Counters glued together. A G-Set (grow-only set) just unions. An OR-Set (observed-remove set) tags each insert with a unique ID so you can tell “I’m removing the banana I saw” from “I’m removing all bananas, including ones added later.” Once you start composing these, you can build registers, maps, and sequences.

Operation-based (CmRDT) — ship the operations

Instead of broadcasting full state, replicas broadcast each operation (“insert ‘x’ at position p with ID k”). For this to converge without a total order, every pair of concurrent operations must commute. The data structure is designed so they do.

This is where collaborative text editing gets hard. The operations are “insert character” and “delete character” against an ever-shifting sequence of positions, and they have to commute even when two people are typing into the same spot. The trick is to give every character a position that doesn’t change when other characters are inserted.

The general move: each character gets a globally unique identifier that places it in some well-defined order, and concurrent inserts at “the same spot” produce different identifiers that the data structure knows how to interleave deterministically. Different CRDTs do this differently: Logoot and Treedoc use dense fractional identifiers (a path in a tree, or a list of (replica-id, counter) pairs) so a new ID can always be found between two neighbors; RGA takes a different approach, treating the sequence as a growable linked structure where each insert names the element it sits after. Yjs adapts an algorithm called YATA; Automerge’s list type is RGA-based, and its rich-text type uses a newer scheme called Peritext. The family is less uniform than introductory writing usually implies.

What CRDTs cost

This is the seams part. The math is beautiful; the engineering is real.

The key intuition to leave with: a CRDT isn’t magic, and it isn’t just “eventual consistency for documents.” It’s a deliberate restriction of what the data structure is allowed to be — only shapes where merge is forced to be safe — in exchange for the freedom to write to any replica, at any time, without asking anyone.

Going deeper