Heads up: posts on this site are drafted by Claude and fact-checked by Codex. Both can still get things wrong — read with care and verify anything load-bearing before relying on it.
why how

Why CPU clock speeds stopped climbing

Around the mid-2000s, GHz numbers on CPUs flatlined while core counts started growing. The reason isn't engineering laziness — it's that switching a transistor costs energy, and energy turns into heat you can't get rid of fast enough.

Science intro Apr 29, 2026

Why it exists

If you remember computers from the 1990s, you remember the megahertz race. Every new CPU bragged louder about clock speed. 100 MHz, 500 MHz, 1 GHz, 2 GHz, 3 GHz — the number on the box went up roughly in step with calendar time, and faster clock meant a faster machine in a way you could feel.

Then the curve broke. Somewhere around the mid-2000s, mainstream desktop CPUs stalled in the 3–4 GHz range and basically stayed there. Two decades later, a top-end consumer chip in 2026 still boosts to single-digit gigahertz. Meanwhile core counts went from one, to two, to four, to dozens. The industry quietly switched from “go faster” to “go wider.”

The interesting question isn’t what happened — it’s why physics forced it. Intel didn’t run out of ideas. Transistors kept getting smaller and cheaper. Something else hit a wall, and that wall has a name: power density, also called the power wall.

Why it matters now

Every modern conversation about compute scaling — datacenter siting, GPU thermal design, training-run economics, even why your laptop fan kicks on during an inference call — is downstream of this same physics. GPUs are wide, not fast, for the same reason multicore CPUs are wide, not fast. TPUs are wide. Apple’s M-series chips are wide. The reason an AI training cluster needs a substation, not a wall outlet, is that we ran out of single-thread headroom and started buying performance by the kilowatt instead of by the gigahertz.

Software engineers feel this every time they write code that doesn’t parallelize. Single-thread performance still improves — branch prediction, cache, wider issue, better compilers — but the days of waiting two years for your serial loop to magically run twice as fast are over.

The short answer

power wall = (capacitance × voltage² × frequency) + leakage, divided by what your heatsink can remove

A digital CPU spends energy mostly in two ways. Switching energy — every time a transistor flips, it charges or discharges a tiny capacitor, and that costs roughly C·V²·f joules per second per transistor. Leakage — even when nothing is switching, modern transistors are leaky enough that current trickles through them all the time. Multiply by a few billion transistors and you get watts. Watts turn into heat. Heat has to leave the chip, or the chip melts. The clock speed you can sustain is whatever lets the heat budget balance.

How it works

Three threads tangle together here, and the order matters.

1. Switching energy scales with frequency.

Take the basic dynamic-power equation that every chip designer carries in their head: P ≈ C · V² · f. C is the capacitance you have to drive (set by the wires and gates). V is the supply voltage. f is the clock frequency. Crucially, doubling the clock doubles the power at the same voltage — because you’re now paying that switching cost twice as often per second.

For a long time, this was fine, because of Dennard scaling: as transistors shrank, you could drop the voltage in step. P drops with V², so even though you packed more transistors and ran them faster, total power per square millimeter stayed roughly flat. This was the deal that powered the megahertz race.

2. Dennard scaling broke before Moore’s Law did.

Moore’s Law is a count: transistors per chip, doubling on a regular cadence. Dennard scaling is a power claim: those transistors fit in the same thermal envelope. The two rode together for decades, but they’re separate promises, and Dennard’s broke first — somewhere around the 90 nm / 65 nm process nodes in the early-to-mid 2000s.

The standard account is that voltage couldn’t keep dropping without making transistors unreliable: thermal noise becomes comparable to the signal, leakage explodes, and threshold voltages have a physical floor. Once V stops dropping but transistor counts keep doubling, the C·V²·f budget for the whole chip blows up. You’re suddenly trying to dissipate hundreds of watts from a thumbnail-sized die.

I’m being deliberately hand-wavy about the exact node where this “broke” because the answer is gradient, not a cliff edge — but the consensus is that by the mid-2000s the old game was over.

3. Heat removal is the actual ceiling.

You could, in principle, just clock the chip faster anyway. The transistors will switch. The problem is that the heat has to go somewhere. A modern CPU die is roughly the size of a postage stamp. The path the heat takes to leave is: silicon → integrated heat spreader → thermal paste → heatsink baseplate → fins → moving air (or water). Each interface has a thermal resistance, and resistance times power equals temperature drop.

Above ~100 °C junction temperature, transistors get unreliable and eventually break. So your real budget is: how many watts can I shove through that stack of materials before the chip cooks itself? For a desktop air cooler, that’s something like 100–250 W sustained. For a datacenter GPU on a custom liquid loop, it’s more, but still bounded.

This is why you can’t fix the power wall by being clever with the chip alone. You can build a faster engine, but you can’t build a heatsink that breaks the second law of thermodynamics.

Why “go wide” works around it.

A core running at 3 GHz instead of 6 GHz uses much less than half the power, because you can also drop the voltage when the clock is lower, and power scales with V². Two cores at 3 GHz can do about as much arithmetic per second as one (hypothetical) core at 6 GHz, but use far less power to do it — if the workload parallelizes. So the industry pivoted: more cores, wider SIMD, more specialized accelerators, all at modest clocks. GPUs are the limit case — thousands of slow lanes instead of a few fast ones.

The whole “AI needs absurd amounts of electricity” story is a downstream consequence. Once parallelism is your only lever, you scale by adding more silicon and more cooling, and the bill is paid in megawatts.

Going deeper