Why Postgres uses MVCC
Readers shouldn't have to wait for writers, and writers shouldn't have to wait for readers — so Postgres keeps multiple versions of every row.
Why it exists
Two people open the same row at the same time. One is reading it for a report; the other is updating it. In the simplest possible database, the writer takes a lock, and the reader waits. Or the reader takes a lock, and the writer waits. Either way, the system serializes around contention — and on a busy table, that contention dominates everything else.
The frustration is that the reader doesn’t actually need the latest value. They need a consistent value: a snapshot of the database that doesn’t shift under their feet mid-query. The writer, similarly, doesn’t need exclusive access to the universe — they just need to know whether their change conflicts with someone else’s.
MVCC exists to break that lock-contention bottleneck. Instead of one row that everyone fights over, the database keeps multiple versions of the row and hands each transaction the version that was current when it started. Readers never block writers. Writers never block readers. The database scales with cores instead of choking on locks.
Why it matters now
Almost every modern relational database engineers reach for —
Postgres,
InnoDB
inside MySQL, Oracle, SQL Server’s snapshot isolation mode, CockroachDB, YugabyteDB
— uses some flavor of MVCC. If you’ve ever wondered why SELECT in Postgres
doesn’t seem to take row locks, why VACUUM is a thing you have to think about,
or why your “long-running analytics query” is suddenly bloating your table —
that’s all MVCC.
It also matters because MVCC is the reason Postgres can give you repeatable-read and serializable isolation without grinding to a halt under load. The “snapshot” that those isolation levels promise isn’t conceptual — it’s literally a set of row versions the engine keeps on disk on your behalf.
The short answer
MVCC = rows-as-versions + visibility rules per transaction
When you write a row in Postgres, the old version isn’t overwritten. A new version is appended, tagged with the transaction that created it. Each transaction gets a snapshot — effectively, a list of “which transactions had committed when I started” — and at read time, Postgres walks the versions and shows you the one that was visible to your snapshot.
So readers and writers don’t fight: they’re literally looking at different copies.
How it works
Every row in Postgres carries two hidden columns:
xmin— the transaction ID that created this row version.xmax— the transaction ID that deleted or updated this row version (or zero if it’s still live).
An UPDATE is, mechanically, an INSERT of a new row version plus a stamp on
the old one’s xmax. Both versions sit in the table at the same time.
When transaction T runs a SELECT, Postgres asks, for each candidate row
version, a question that’s roughly:
Was
xminalready committed when T started, and isxmaxeither zero or belongs to a transaction that hadn’t committed when T started?
If yes, T sees that version. If no, it skips to the next one. That’s the “visibility check,” and it’s how a reader gets a stable snapshot without taking any locks on the rows.
The price you pay: dead row versions accumulate. In Postgres, the background process that reclaims them is autovacuum. If autovacuum can’t keep up — or if a long-running transaction holds open a snapshot that “could still need” old versions — the table grows even when the logical row count is flat. That’s table bloat, and it’s the most common operational pain MVCC creates.
There’s also the transaction ID wraparound problem — Postgres’s transaction IDs are 32-bit, and old row versions need to be either frozen or cleaned up before the counter laps them. This is the kind of thing that shows up at 3 a.m. on tables that haven’t been vacuumed in months.
Show the seams
A few things the textbook version of MVCC tends to gloss over:
- Postgres’s flavor is unusual. Most other MVCC databases (Oracle, MySQL/InnoDB)
store old versions in a separate
undo log / rollback segment
and overwrite the live row in-place. Postgres keeps versions inline in the
table. That’s why
VACUUMis so visible in Postgres specifically — other engines do equivalent work, but it’s hidden in undo segments rather than in your main table file. - MVCC doesn’t make all writes lock-free. Two transactions updating the same row still conflict — one of them will block on a row-level lock or get a serialization failure. MVCC only removes the read-vs-write contention.
- Snapshot isolation is not serializability. Postgres’s default
READ COMMITTEDand evenREPEATABLE READallow some anomalies (notably “write skew”) that true serializability forbids. To get actual serializability, Postgres adds an extra layer called SSI on top of MVCC, and it works by aborting transactions when it detects trouble, not by locking more.
I don’t have a confident date for when MVCC first entered Postgres specifically — the design traces back to its Berkeley POSTGRES roots in the 1980s, but I haven’t re-sourced the exact release where the modern semantics solidified, so I won’t pin a year here.
Famous related terms
- Snapshot isolation —
snapshot isolation = MVCC + "read your snapshot, validate at commit"— the isolation level MVCC most naturally provides. - VACUUM —
VACUUM ≈ garbage collector for dead row versions— the cost MVCC pushes onto operations. - Optimistic concurrency control —
OCC ≈ "assume no conflict, check at commit"— the philosophical sibling: don’t lock, detect conflicts later. - Write-ahead log (WAL) —
WAL = "log the change before touching the data file"— orthogonal to MVCC but the other half of how Postgres stays correct under crashes.
Going deeper
- Postgres docs, “Concurrency Control” chapter — the canonical reference for the visibility rules and isolation levels.
- Bruce Momjian’s “MVCC Unmasked” talk slides — a widely-cited walkthrough of
xmin/xmaxwith diagrams. - Hellerstein, Stonebraker & Hamilton, Architecture of a Database System — situates MVCC against lock-based concurrency in a broader systems frame.