Heads up: posts on this site are drafted by Claude and fact-checked by Codex. Both can still get things wrong — read with care and verify anything load-bearing before relying on it.
why how

Why compression works at all

Zip a photo and it shrinks; zip the zip and it doesn't. Compression isn't magic — it only ever exploits the patterns that were already there.

Computer Science intro May 14, 2026

Why it exists

You take a folder of plain text files or an uncompressed bitmap, right-click, “Compress to .zip”, and it comes out a fraction of the size. So you try the obvious trick: zip the zip. It comes out the same size again — maybe a few bytes bigger. Run it a third time and nothing happens at all. If compression is just “make files smaller,” why does it refuse to keep going?

The answer is the whole point of this post: a compressor doesn’t make data smaller, it makes redundancy smaller. The original file was full of predictable structure — repeated words, repeated byte sequences, long runs of the same value. The first pass spent that structure. What’s left looks, statistically, like random noise, and noise has nothing left to exploit. The surprising claim underneath the everyday tool is: random data can’t be losslessly compressed — not on average, and not for almost any individual file. Compression is not a property of “shrinking” — it’s a property of prediction.

Why it matters now

This isn’t a museum fact; it sets hard limits on systems people use every day. It’s why a video codec leans on interframe prediction — consecutive frames are nearly identical, and that near-identity is the compressible part. It’s why already-compressed formats (JPEG, MP4, .zip) barely shrink when you zip them again, so backup tools skip the wasted second pass. It’s why encrypted traffic is roughly incompressible — good encryption is designed to look random, which by this same argument means there’s nothing to squeeze. And it’s why “infinite compression” pitches are a reliable tell for a scam: the counting argument below rules them out.

The short answer

compression = a model of what's likely + spend fewer bits on the likely cases

A compressor works by predicting. It builds (explicitly or implicitly) a model of which patterns are common, then assigns short codes to common things and long codes to rare things. Total size goes down only because the data really was predictable. Feed it data with no pattern — where every continuation is equally likely — and there are no “common cases” to give short codes to. The model has nothing to say, so on average the output is no smaller than the input, and the rare input that does shrink only does so by luck.

How it works

A worked example: run-length encoding

Take the simplest real compressor. Suppose a scanline of a black-and-white image is:

WWWWWWWWWWWWBWWWWWWWWWWWWBBB

28 characters. Run-length encoding rewrites it as “12 W, 1 B, 12 W, 3 B”:

12W1B12W3B

10 characters. It worked — but look at why. The input had long runs, i.e. each pixel strongly predicted the next. Now feed the same encoder a scanline with no runs:

WBWBWBWBWBWBWBWBWBWBWBWBWBW

Run-length encoding produces 1W1B1W1B...longer than the original. Note this string isn’t random — “repeat WB” is an obvious pattern — it just has no run-length redundancy, the only kind this encoder knows how to spend. Every general-purpose compressor is this same bet at a larger scale: it wins on inputs that match its model and loses on inputs that don’t.

Why “lose on some inputs” is unavoidable

Here’s the precise version of the surprising claim — and it’s a counting argument, not hand-waving.

A lossless compressor must be invertible: distinct inputs map to distinct outputs, or you couldn’t decompress unambiguously. Now count. There are exactly 2¹⁰⁰ possible 100-bit files. There are only 2⁰ + 2¹ + … + 2⁹⁹ = 2¹⁰⁰ − 1 files shorter than 100 bits — fewer than the number of inputs. So no lossless compressor can map every 100-bit input to a shorter output: there isn’t enough room. By the pigeonhole principle, at least one 100-bit input must come out the same length or longer.

The honest statement is therefore not “compressors always make some file bigger by a lot” — it’s: no lossless compressor shrinks every input, and only a small fraction of inputs can shrink by much. (If many inputs shrank a lot, there wouldn’t be enough short outputs to hold them — same counting again.) Real compressors are useful anyway because the files people actually have — text, images, audio, code — are a tiny, highly structured corner of the space of all possible files. Compressors are tuned to win big on that corner and usually pay only a small overhead on inputs that don’t fit their model — though a mismatch can cost more, as the alternating-WB case above shows.

Where entropy comes in

Information entropy makes “how predictable” into a number. The entropy of a data source — the process generating the symbols — is the average number of bits per symbol you’d need to encode it optimally, and it is a hard floor. No lossless scheme can average fewer bits per symbol than the source’s entropy; this is Shannon’s source-coding theorem. (It’s a statement about the source’s distribution, not any single finite file — but it’s what pins down what’s possible.)

That reframes everything above cleanly:

So “you can’t compress random data” isn’t a quirk of .zip. It’s the source-coding theorem stated in plain words: you can’t represent a source in fewer bits per symbol than its own unpredictability.

One honest seam

“Random” here means no exploitable structure for the model in use — not “produced by a random process” in some cosmic sense. Data can look random to one compressor and be wildly compressible to another that knows the right pattern. The digits of π look like noise to gzip, but “the first million digits of π” is a short description if your decompressor knows how to compute π. Entropy is always relative to a model; there is no model-free notion of “the” compressibility of a specific finite string that’s also computable in practice. The counting argument still holds regardless — no model wins on everything.

Going deeper