Why virtual memory exists
Every process thinks it owns the whole machine. That lie is the foundation almost every modern OS feature is built on.
Why it exists
Imagine writing a program in 1965. You compile it, and the linker burns the absolute physical addresses of every variable and function straight into the binary. The program only runs if those exact bytes of RAM are free. Run two programs at once and they fight over the same addresses. Crash one and it can scribble all over the other. Want to use more memory than the box has? Tough — there is no “more.”
That’s the world before virtual memory. The pain points stack up fast:
- Programs can’t coexist without coordinating addresses, which they can’t.
- No isolation — any pointer bug in any program can corrupt any other.
- No way to use a disk as overflow — addresses are physical, and disks aren’t memory.
- No way to load a program bigger than RAM at all.
Virtual memory solves all four with one move: lie to every process. Each process gets its own private, contiguous-looking address space, starting from zero, as if it owned the entire machine. Underneath, the MMU and the kernel translate those fake addresses to real physical RAM — or to disk, or to “doesn’t exist yet, allocate on first touch,” or to “shared with another process.”
Once you have the indirection, you get everything: process isolation, memory
overcommit, copy-on-write fork, memory-mapped files, shared libraries loaded
once and mapped into many processes, swap, ASLR, lazy allocation. None of that
is possible without the lie.
Why it matters now
It’s invisible until it isn’t. Every time you:
- run more processes than fit in RAM,
mmapa 200 GB file and read it like an array,- load a 70 GB LLM’s weights from disk and let the OS page them in,
- watch your dev server’s memory go up but your machine stay snappy,
- get killed by the OOM killer for touching pages the kernel had only promised you,
…you are leaning on virtual memory. Container memory limits, cgroup
accounting, Postgres MVCC sharing buffer pages
between processes, GPU drivers mapping VRAM into your address space — all of
it sits on top of the same trick.
The AI-era version: model weights and KV caches are huge, and a lot of the
“how do I fit this on this box” art is really how do I cooperate with virtual
memory — using mmap so the OS pages weights in lazily, sharing read-only
pages between worker processes, or going the other way and pinning pages so
the GPU can DMA them.
The short answer
virtual memory = per-process fake address space + page table that maps fake → real (or to disk, or to nothing yet)
Every process sees its own address space. The CPU’s MMU walks a per-process page table on every memory access to translate virtual addresses to physical ones. If a page isn’t there, the kernel gets a chance to fix it up — load from disk, allocate a new page, or kill the process for touching memory it doesn’t own.
How it works
Memory is divided into fixed-size pages — typically 4 KB on x86-64, though modern systems also support “huge pages” of 2 MB or 1 GB to cut down on translation overhead. The unit of mapping is the page, not the byte.
Each process has a page table: a tree-shaped data structure where the leaves are PTEs, each saying “virtual page N maps to physical page M, with these permissions (read/write/execute), and here’s a ‘present’ bit.” On x86-64 it’s a 4-level tree (5 levels on newer chips with larger address spaces).
On every memory access, the MMU does a page walk to translate the address. To avoid doing this for every load and store, there’s a hardware cache of recent translations called the TLB. TLB hits are essentially free; TLB misses cost dozens of cycles. This is why huge pages are interesting — fewer entries cover more memory.
When the MMU encounters a PTE with the present bit clear — the page isn’t currently in RAM — it raises a page fault. The kernel’s fault handler then decides what that fault means:
- Lazy allocation. First-touch on a freshly
malloc’d page that was never written. Allocate a real page, zero it, fix up the PTE, restart the instruction. The program never knows. - File-backed page-in. The page belongs to a memory-mapped file. Read it from disk, install it, restart.
- Swap-in. The page was evicted to swap. Read it back, install it, restart.
- Copy-on-write. Two processes share a page read-only after
fork. One tries to write. The kernel allocates a fresh copy, points the writer’s PTE at it, restarts. Until that moment,forkwas nearly free. - Segfault. The address isn’t mapped at all, or the access violates
permissions (write to read-only, execute on no-execute). Kernel sends
SIGSEGV. The famous segfault is just “the page table said no.”
Cases 1–4 are invisible to the program. Case 5 is the only one users see, and even then only as “the program crashed.”
What you actually get from the indirection
Once translation exists, layering features onto it is almost free:
- Isolation — process A simply has no PTE pointing at process B’s pages.
- Shared libraries —
libc’s code pages exist once in RAM, mapped read-only into every process that uses it. mmap— treat a file as an array; the kernel pages it in on demand.- Overcommit — hand out virtual pages without backing them with physical ones until they’re touched. This is where the OOM killer comes from.
- ASLR — pick different virtual addresses each run so attackers can’t predict where things land. Cheap because the addresses are fake anyway.
Show the seams
- It is not free. Every memory access pays for the page walk if the TLB misses. Workloads with terrible locality (some hash tables, some graph traversals) can spend a real fraction of their time in TLB misses. Huge pages exist mostly to amortize this.
- Page tables themselves consume memory. A 4 KB page on x86-64 needs a PTE somewhere; covering a terabyte of address space costs measurable RAM for the tables alone. This is part of why huge pages matter for big-memory workloads.
- Fork + exec is doing more than it looks. A
fork()of a 10 GB process doesn’t copy 10 GB; it duplicates page tables and marks pages copy-on-write. A typicalfork-then-execpattern barely touches anything before tossing the child’s address space away. This is why UNIX’s “fork is cheap” claim survives. - Swap is not “more RAM.” It’s “RAM with disk latency on miss.” Once the working set exceeds RAM, the system thrashes — page-fault, page-in, evict, page-fault again — and effective throughput collapses. The invisibility of paging is also its trap.
- The exact PTE format is architecture-specific. I’ve described x86-64; ARM64 differs in details (and uses two separate page-table bases for user/kernel). The shape — multi-level tree, TLB, faults — is universal, but the bits aren’t.
The deepest seam: virtual memory is the OS pretending it has more control than the hardware really gives it. The MMU is hardware; the kernel only gets a turn when the MMU faults. Almost every “OS feature” in this space is really “trick the MMU into faulting at a useful moment, then do work in the handler.” Page-on-demand, copy-on-write, swap, mmap, even some JIT and GC write barriers — all the same shape.
Famous related terms
- TLB —
TLB ≈ CPU cache for virtual→physical translations— the reason page-table walks aren’t crippling on every load. - Page fault —
page fault = MMU saying "this PTE isn't usable" + kernel handler that decides what to do— the universal hook for paging features. mmap—mmap = map a file (or anonymous pages) into your address space + let the kernel page it in on demand— virtual memory exposed as an API.- Copy-on-write —
CoW = share pages read-only + copy lazily on first write— what makesforkcheap. - Swap / paging —
swap ≈ disk used as overflow RAM, one page at a time— the original “more memory than the box has” trick. - OOM killer — see OOM killer — what happens when overcommit’s bet finally loses.
- Huge pages —
huge pages = 2 MB or 1 GB pages instead of 4 KB— fewer TLB entries cover more memory; helps big-RAM workloads.
Going deeper
- Operating Systems: Three Easy Pieces (Remzi & Andrea Arpaci-Dusseau) — the virtualization-of-memory chapters are the best free intro I know.
- Ulrich Drepper, What Every Programmer Should Know About Memory — old but still excellent on TLBs, caches, and how the abstraction leaks into performance.
- Linux kernel:
mm/memory.candarch/x86/mm/fault.c— the fault handler is where the magic lives.