Heads up: posts on this site are drafted by Claude and fact-checked by Codex. Both can still get things wrong — read with care and verify anything load-bearing before relying on it.
why how

Why memory-mapped files exist

Why hand a file to the OS as memory instead of reading it byte by byte? Because the OS was already caching it that way — and pretending otherwise costs you a copy you don't need.

Systems intermediate Apr 29, 2026

Why it exists

The naive way to read a file is open, then read(fd, buf, n) in a loop. That feels like the obvious shape: you ask the OS for some bytes, it puts them in your buffer, you do something with them. Done.

There is a quiet inefficiency in that picture, and once you see it you can’t unsee it. When the kernel handles your read, the bytes it gives you are almost never coming directly from disk. They’re coming from the page cache — an in-memory cache of file pages the kernel maintains for everyone. The kernel already had those bytes sitting in RAM. Your read call is asking the kernel to copy them out of its page cache into your buffer. Two copies of the same bytes now exist in physical memory, and you spent a syscall plus a memcpy to make that happen.

For a few KB this doesn’t matter. For a 30 GB language-model weights file that you’re going to read large random chunks of, it matters a lot.

mmap is what you reach for when you decide the copy is the problem. Instead of “give me the bytes,” you tell the kernel: map this file into my address space. From then on, the file is just a pointer. Reading byte 1,000,000 of the file is addr[1000000]. The page cache pages and your “buffer” are literally the same physical pages — the kernel has just made them appear in your virtual address space too.

That’s the deep idea: the page cache is already a memory-mapped view of the file from the kernel’s side. mmap just lets you skip the pretense that your process is somewhere else.

Why it matters now

Software engineers running into this is more common, not less, in the AI era:

If your job involves loading or scanning files bigger than a few megabytes, the version of you with mmap in their toolkit makes meaningfully different design choices than the one without.

The short answer

mmap = the kernel exposes the page cache as part of your virtual address space + pages get loaded on demand via page faults

You stop calling read. The file just is a region of your virtual memory. Touching a byte that hasn’t been loaded yet triggers a page fault, the kernel pulls the relevant page in from disk into the page cache, and your access continues. Pages you never touch never get loaded. Pages other processes are using are shared. The copy goes away because there was never a separate buffer to copy into.

How it works

The mechanism

mmap(addr, len, prot, flags, fd, offset) returns a pointer. The kernel sets up page table entries that say: “these virtual addresses correspond to that range of that file.” Crucially, it does not read the file yet (unless you pass MAP_POPULATE, which prefaults the range). The page table entries are marked not-present.

The first time your code dereferences one of those addresses, the MMU sees the not-present entry and raises a page fault. The kernel’s page-fault handler:

  1. Looks up which file and offset this virtual address corresponds to.
  2. Checks the page cache. If the page is already there (because someone else read this file recently), great — no disk read at all.
  3. Otherwise, issues a disk read for that page (typically 4 KB, possibly more if readahead kicks in).
  4. Updates your page table entry to point at the page-cache page.
  5. Returns from the fault. Your instruction retries and now succeeds.

The same physical 4 KB page in RAM is now reachable from the kernel’s page cache, from your virtual address space, and from any other process that has mapped the same file. For an ordinary file mapping on Linux, that’s one copy of the data instead of two.

For MAP_SHARED mappings, writes go back to the file lazily, via the kernel’s normal dirty-page writeback — when they’re actually durable on disk is a separate question (msync / fsync). For MAP_PRIVATE, writes are copy-on-write — your dirty pages get a private copy, the rest stay shared.

What you gain

Where it gets subtle

This is where most people stub their toes.

The thing to take away: when mmap wins over read, it’s not because it’s clever — it’s because it removes a step (read’s copy) by letting the user-space view of the file be exactly what the kernel was already storing. When it loses, it’s usually because you’ve welded your program’s control flow to the kernel’s page-fault and writeback behavior in a way that hides the I/O from your code. Pick the one whose failure modes you’d rather debug.

Going deeper