Heads up: posts on this site are drafted by Claude and fact-checked by Codex. Both can still get things wrong — read with care and verify anything load-bearing before relying on it.
why how

Why Git stores snapshots, not diffs

Git's reputation says 'version control = diffs.' Git's actual model says 'version control = snapshots, hashed.' That swap is the whole reason Git feels different.

Computer Science intro Apr 29, 2026

Why it exists

If you came to Git after using Subversion or CVS, the mental model you arrived with was almost certainly wrong: “a repo is a sequence of diffs from the previous version.” That’s how the older systems worked. To check out an old file, they walked backward applying patches.

Git does not do this. A Git commit is a snapshot of the entire tree as it existed at that moment — every file, in full. When a file doesn’t change between two commits, Git doesn’t store a diff of zero bytes; it just points the new commit at the same file blob the old commit was already pointing at.

That sounds wasteful at first. It isn’t, and the reason it isn’t is the whole point of this post — and most of why Git feels qualitatively different from what came before.

Why it matters now

Almost every codebase you’ll touch as a software engineer in 2026 lives in Git. The mental model leaks into everything: why git log is fast, why branches are free, why git rebase can reorder history without corrupting it, why a detached HEAD is a normal state and not a disaster, why GitHub can show you any historical version of a file instantly without replaying patches.

It also matters because the snapshot model is what makes Git a content-addressed store — the same idea now showing up in IPFS, container image layers, Nix, and large-model weight stores. Git was an early mainstream example of “address things by what they are, not where they live,” and that idea keeps being rediscovered.

The short answer

git commit ≈ snapshot of the tree, addressed by the SHA-1 hash of its contents

A commit is a tiny object that points at a tree (a directory listing). The tree points at blobs (file contents) and at sub-trees. Every one of those objects is named by the hash of its own bytes. If two commits contain the same file, they end up pointing at the same blob — automatically, with no deduplication step.

There are no diffs in the storage model. Diffs are something Git computes on demand when you ask git diff or git log -p.

How it works

Four object types, all hashed, all immutable:

Each object’s name is the SHA-1 of its contents, so identical content has identical names automatically. Add the same file to two repos in two countries; both repos call its blob the same 40-character hex string.

A walked-through example. Imagine a repo with two files, README.md and src/main.py. Commit 1 has both. You then change only README.md and make commit 2.

Two full snapshots are stored, but blobX and the entire src/ tree object exist exactly once on disk. The “diff” between commit 1 and commit 2 is something Git derives at read time by walking the two trees and noticing that README.md changed.

This is why:

”But surely it can’t store full copies forever”

It doesn’t, exactly. The four object types above describe the logical model — what Git tells the rest of itself it has. Underneath, Git has a second layer called packfiles. When a repo grows, git gc rolls many loose objects into a packfile and there it does delta-compress similar blobs against each other to save space.

The crucial detail: the deltas in a packfile are an internal storage trick, not the model. They aren’t tied to commit history — Git picks whichever pair of similar blobs compresses best, regardless of which commits they belong to. The logical layer is still snapshots-by-hash; the deltas are just zip-like compression underneath.

So the slogan “Git stores snapshots, not diffs” is true at the layer that matters for reasoning about Git, even though at the bytes-on-disk layer Git absolutely uses deltas to save space. The trick is that the deltas don’t define identity. The hash of the snapshot does.

Where the model shows its seams

A few places it gets weird:

Going deeper