Why memory-mapped files exist
Why hand a file to the OS as memory instead of reading it byte by byte? Because the OS was already caching it that way — and pretending otherwise costs you a copy you don't need.
Why it exists
The naive way to read a file is open, then read(fd, buf, n) in a loop.
That feels like the obvious shape: you ask the OS for some bytes, it puts
them in your buffer, you do something with them. Done.
There is a quiet inefficiency in that picture, and once you see it you can’t
unsee it. When the kernel handles your read, the bytes it gives you are
almost never coming directly from disk. They’re coming from the
page cache —
an in-memory cache of file pages the kernel maintains for everyone. The
kernel already had those bytes sitting in RAM. Your read call is asking
the kernel to copy them out of its page cache into your buffer. Two
copies of the same bytes now exist in physical memory, and you spent a
syscall
plus a memcpy to make that happen.
For a few KB this doesn’t matter. For a 30 GB language-model weights file that you’re going to read large random chunks of, it matters a lot.
mmap
is what you reach for when you decide the copy is the problem. Instead of
“give me the bytes,” you tell the kernel: map this file into my address
space. From then on, the file is just a pointer. Reading byte 1,000,000 of
the file is addr[1000000]. The page cache pages and your “buffer” are
literally the same physical pages — the kernel has just made them appear in
your virtual address space too.
That’s the deep idea: the page cache is already a memory-mapped view of the
file from the kernel’s side. mmap just lets you skip the pretense that
your process is somewhere else.
Why it matters now
Software engineers running into this is more common, not less, in the AI era:
- Loading model weights.
llama.cppreads GGUF files viammapby default. The 30 GB file isn’t loaded into RAM up front; the OS pulls in pages on demand as the inference engine touches tensor data. Multiple processes loading the same model can share the same physical pages because the page cache is shared — the first process pays the disk read; the rest hit warm cache, while those pages stay resident. safetensorsoffers a memory-mapped loading mode for the same reason — zero-copy reads, lazy faulting, no doubled RAM.- Databases. LMDB is built around
mmap. SQLite supports it as an opt-in mode (off by default). MongoDB’s old MMAPv1 engine was named after it. Postgres famously does not use it — it manages its own buffer pool. There are good arguments both ways. - Executables. The OS loader maps ELF binaries and shared libraries into
memory rather than reading them; that’s why two processes running the
same binary share its read-only pages. Arrow IPC and other random-access
binary formats are also commonly consumed via
mmap. - Inter-process communication via shared memory is a special case of
the same primitive:
mmapsomething, multiple processes see the same bytes.
If your job involves loading or scanning files bigger than a few megabytes,
the version of you with mmap in their toolkit makes meaningfully
different design choices than the one without.
The short answer
mmap = the kernel exposes the page cache as part of your virtual address space + pages get loaded on demand via page faults
You stop calling read. The file just is a region of your virtual memory.
Touching a byte that hasn’t been loaded yet triggers a
page fault,
the kernel pulls the relevant page in from disk into the page cache, and
your access continues. Pages you never touch never get loaded. Pages other
processes are using are shared. The copy goes away because there was never
a separate buffer to copy into.
How it works
The mechanism
mmap(addr, len, prot, flags, fd, offset) returns a pointer. The kernel
sets up page table
entries that say: “these virtual addresses correspond to that range of
that file.” Crucially, it does not read the file yet (unless you pass
MAP_POPULATE, which prefaults the range). The page table entries are
marked not-present.
The first time your code dereferences one of those addresses, the MMU sees the not-present entry and raises a page fault. The kernel’s page-fault handler:
- Looks up which file and offset this virtual address corresponds to.
- Checks the page cache. If the page is already there (because someone else read this file recently), great — no disk read at all.
- Otherwise, issues a disk read for that page (typically 4 KB, possibly more if readahead kicks in).
- Updates your page table entry to point at the page-cache page.
- Returns from the fault. Your instruction retries and now succeeds.
The same physical 4 KB page in RAM is now reachable from the kernel’s page cache, from your virtual address space, and from any other process that has mapped the same file. For an ordinary file mapping on Linux, that’s one copy of the data instead of two.
For MAP_SHARED mappings, writes go back to the file lazily, via the
kernel’s normal dirty-page writeback — when they’re actually durable on
disk is a separate question (msync / fsync). For MAP_PRIVATE, writes are
copy-on-write —
your dirty pages get a private copy, the rest stay shared.
What you gain
- No copy.
readdoes kernel-page-cache → user-buffer.mmapskips the destination buffer entirely. - Lazy loading. A 30 GB file mapped in is essentially free until you touch it. The kernel pulls in only what you access.
- Sharing across processes. Two processes that map the same file see the same physical pages. Loading the same model into two inference workers doesn’t double your RAM bill.
- The OS handles eviction for you. Under memory pressure, the kernel can drop clean mapped pages without consulting you; they’ll fault back in if needed. You get something close to “the file is in RAM when there’s RAM, on disk when there isn’t” without writing any caching code.
Where it gets subtle
This is where most people stub their toes.
- I/O errors become signals, not return values.
readreturns-1witherrnoset. A fault inside anmmapregion — accessing past the end of a file that got truncated under you is the canonical case — can surface asSIGBUSfrom inside an ordinary memory access. Code that wasn’t expecting “loading this byte might fail” handles this badly. This is one of the real reasons some serious databases avoidmmap. - You don’t control when I/O happens. With
readyou decide. Withmmap, every memory access is a possible disk read. A latency-sensitive thread can stall on a page fault you didn’t see coming. - Random access patterns are TLB-hostile. Each 4 KB page you touch
may consume a TLB
entry. Sweeping a multi-GB file across a small TLB causes a lot of TLB
misses and the page-walk cost that comes with them. Huge pages help;
madvisehints help. - Address-space exhaustion was a real thing on 32-bit. A 4 GB virtual
address space couldn’t map a 5 GB file. This is one of the reasons
mmap-based storage engines were rough on 32-bit systems; on 64-bit it’s a non-issue for any realistic file. - The kernel may make worse caching decisions than your app. Postgres’
case for managing its own buffer pool is exactly this: a database knows
which pages are hot in a way the kernel’s generic page-replacement
policy does not.
mmapis great when “treat the OS’s page cache as your cache” is right; less great when it isn’t. madviseis the steering wheel.MADV_SEQUENTIALtells the kernel to read ahead aggressively and drop pages quickly;MADV_RANDOMsuppresses readahead;MADV_DONTNEEDasks the kernel to drop the pages (semantics differ for shared vs. anonymous mappings). If yourmmapworkload feels slow, a missingmadvisehint is one of the first things to check.
The thing to take away: when mmap wins over read, it’s not because
it’s clever — it’s because it removes a step (read’s copy) by letting
the user-space view of the file be exactly what the kernel was already
storing. When it loses, it’s usually because you’ve welded your program’s
control flow to the kernel’s page-fault and writeback behavior in a way
that hides the I/O from your code. Pick the one whose failure modes you’d
rather debug.
Famous related terms
- Page cache —
page cache = kernel-managed RAM cache of file pages, keyed by (file, offset)— the thingmmapexposes; without it,mmapwould have nothing to hand you. - Demand paging —
demand paging = pages get loaded only when first accessed, via page fault— the load-on-touch behavior that makes mapping huge files cheap. - Copy-on-write —
COW = shared page until someone writes; then duplicate lazily— whatMAP_PRIVATEandforkboth rely on. - Huge pages —
huge pages = 2 MB or 1 GB page-table entries instead of 4 KB— reduce TLB pressure when mapping large regions; relevant for big model files. madvise—madvise = hints to the kernel about your access pattern—SEQUENTIAL,RANDOM,WILLNEED,DONTNEED. The way you tune anmmapworkload.- Zero-copy I/O —
zero-copy = avoid the kernel↔user memcpy on the I/O path—mmapis the page-cache-sharing flavor;sendfile,splice, andio_uringregistered buffers attack the same overhead from different angles. - Virtual memory — the substrate that makes any of this possible.
- Syscalls are expensive — part of
mmap’s win is one syscall up front instead of one perread.
Going deeper
- The Linux
mmap(2)andmadvise(2)man pages are short and worth a full read; most of the gotchas above are quietly documented there. - “Are You Sure You Want to Use MMAP in Your Database Management
System?” (Crotty, Leis, Pavlo, CIDR 2022) — the most-cited
case against
mmapfor database storage. Even if you disagree, it’s a clean enumeration of the failure modes. - The
llama.cppsource: search formmapinllama.cpp/ggml.cto see one of the more important real-world consumers of this primitive in modern AI tooling. - LWN articles on the page cache and on transparent huge pages — the
Linux-side mechanics that decide what
mmapactually feels like in production.