Why memory-mapped files exist

Why hand a file to the OS as memory instead of reading it byte by byte? Because the OS was already caching it that way — and pretending otherwise costs you a copy you don't need.

Systems intermediate Apr 29, 2026

Why it exists

The naive way to read a file is open, then read(fd, buf, n) in a loop. That feels like the obvious shape: you ask the OS for some bytes, it puts them in your buffer, you do something with them. Done.

There is a quiet inefficiency in that picture, and once you see it you can’t unsee it. When the kernel handles your read, the bytes it gives you are almost never coming directly from disk. They’re coming from the page cache — an in-memory cache of file pages the kernel maintains for everyone. The kernel already had those bytes sitting in RAM. Your read call is asking the kernel to copy them out of its page cache into your buffer. Two copies of the same bytes now exist in physical memory, and you spent a syscall plus a memcpy to make that happen.

For a few KB this doesn’t matter. For a 30 GB language-model weights file that you’re going to read large random chunks of, it matters a lot.

mmap is what you reach for when you decide the copy is the problem. Instead of “give me the bytes,” you tell the kernel: map this file into my address space. From then on, the file is just a pointer. Reading byte 1,000,000 of the file is addr[1000000]. The page cache pages and your “buffer” are literally the same physical pages — the kernel has just made them appear in your virtual address space too.

That’s the deep idea: the page cache is already a memory-mapped view of the file from the kernel’s side. mmap just lets you skip the pretense that your process is somewhere else.

Why it matters now

Software engineers running into this is more common, not less, in the AI era:

Loading model weights. llama.cpp reads GGUF files via mmap by default. The 30 GB file isn’t loaded into RAM up front; the OS pulls in pages on demand as the inference engine touches tensor data. Multiple processes loading the same model can share the same physical pages because the page cache is shared — the first process pays the disk read; the rest hit warm cache, while those pages stay resident.
safetensors offers a memory-mapped loading mode for the same reason — zero-copy reads, lazy faulting, no doubled RAM.
Databases. LMDB is built around mmap. SQLite supports it as an opt-in mode (off by default). MongoDB’s old MMAPv1 engine was named after it. Postgres famously does not use it — it manages its own buffer pool. There are good arguments both ways.
Executables. The OS loader maps ELF binaries and shared libraries into memory rather than reading them; that’s why two processes running the same binary share its read-only pages. Arrow IPC and other random-access binary formats are also commonly consumed via mmap.
Inter-process communication via shared memory is a special case of the same primitive: mmap something, multiple processes see the same bytes.

If your job involves loading or scanning files bigger than a few megabytes, the version of you with mmap in their toolkit makes meaningfully different design choices than the one without.

The short answer

mmap = the kernel exposes the page cache as part of your virtual address space + pages get loaded on demand via page faults

You stop calling read. The file just is a region of your virtual memory. Touching a byte that hasn’t been loaded yet triggers a page fault, the kernel pulls the relevant page in from disk into the page cache, and your access continues. Pages you never touch never get loaded. Pages other processes are using are shared. The copy goes away because there was never a separate buffer to copy into.

How it works

The mechanism

mmap(addr, len, prot, flags, fd, offset) returns a pointer. The kernel sets up page table entries that say: “these virtual addresses correspond to that range of that file.” Crucially, it does not read the file yet (unless you pass MAP_POPULATE, which prefaults the range). The page table entries are marked not-present.

The first time your code dereferences one of those addresses, the MMU sees the not-present entry and raises a page fault. The kernel’s page-fault handler:

Looks up which file and offset this virtual address corresponds to.
Checks the page cache. If the page is already there (because someone else read this file recently), great — no disk read at all.
Otherwise, issues a disk read for that page (typically 4 KB, possibly more if readahead kicks in).
Updates your page table entry to point at the page-cache page.
Returns from the fault. Your instruction retries and now succeeds.

The same physical 4 KB page in RAM is now reachable from the kernel’s page cache, from your virtual address space, and from any other process that has mapped the same file. For an ordinary file mapping on Linux, that’s one copy of the data instead of two.

For MAP_SHARED mappings, writes go back to the file lazily, via the kernel’s normal dirty-page writeback — when they’re actually durable on disk is a separate question (msync / fsync). For MAP_PRIVATE, writes are copy-on-write — your dirty pages get a private copy, the rest stay shared.

What you gain

No copy. read does kernel-page-cache → user-buffer. mmap skips the destination buffer entirely.
Lazy loading. A 30 GB file mapped in is essentially free until you touch it. The kernel pulls in only what you access.
Sharing across processes. Two processes that map the same file see the same physical pages. Loading the same model into two inference workers doesn’t double your RAM bill.
The OS handles eviction for you. Under memory pressure, the kernel can drop clean mapped pages without consulting you; they’ll fault back in if needed. You get something close to “the file is in RAM when there’s RAM, on disk when there isn’t” without writing any caching code.

Where it gets subtle

This is where most people stub their toes.

I/O errors become signals, not return values. read returns -1 with errno set. A fault inside an mmap region — accessing past the end of a file that got truncated under you is the canonical case — can surface as SIGBUS from inside an ordinary memory access. Code that wasn’t expecting “loading this byte might fail” handles this badly. This is one of the real reasons some serious databases avoid mmap.
You don’t control when I/O happens. With read you decide. With mmap, every memory access is a possible disk read. A latency-sensitive thread can stall on a page fault you didn’t see coming.
Random access patterns are TLB-hostile. Each 4 KB page you touch may consume a TLB entry. Sweeping a multi-GB file across a small TLB causes a lot of TLB misses and the page-walk cost that comes with them. Huge pages help; madvise hints help.
Address-space exhaustion was a real thing on 32-bit. A 4 GB virtual address space couldn’t map a 5 GB file. This is one of the reasons mmap-based storage engines were rough on 32-bit systems; on 64-bit it’s a non-issue for any realistic file.
The kernel may make worse caching decisions than your app. Postgres’ case for managing its own buffer pool is exactly this: a database knows which pages are hot in a way the kernel’s generic page-replacement policy does not. mmap is great when “treat the OS’s page cache as your cache” is right; less great when it isn’t.
madvise is the steering wheel. MADV_SEQUENTIAL tells the kernel to read ahead aggressively and drop pages quickly; MADV_RANDOM suppresses readahead; MADV_DONTNEED asks the kernel to drop the pages (semantics differ for shared vs. anonymous mappings). If your mmap workload feels slow, a missing madvise hint is one of the first things to check.

The thing to take away: when mmap wins over read, it’s not because it’s clever — it’s because it removes a step (read’s copy) by letting the user-space view of the file be exactly what the kernel was already storing. When it loses, it’s usually because you’ve welded your program’s control flow to the kernel’s page-fault and writeback behavior in a way that hides the I/O from your code. Pick the one whose failure modes you’d rather debug.

Page cache — page cache = kernel-managed RAM cache of file pages, keyed by (file, offset) — the thing mmap exposes; without it, mmap would have nothing to hand you.
Demand paging — demand paging = pages get loaded only when first accessed, via page fault — the load-on-touch behavior that makes mapping huge files cheap.
Copy-on-write — COW = shared page until someone writes; then duplicate lazily — what MAP_PRIVATE and fork both rely on.
Huge pages — huge pages = 2 MB or 1 GB page-table entries instead of 4 KB — reduce TLB pressure when mapping large regions; relevant for big model files.
madvise — madvise = hints to the kernel about your access pattern — SEQUENTIAL, RANDOM, WILLNEED, DONTNEED. The way you tune an mmap workload.
Zero-copy I/O — zero-copy = avoid the kernel↔user memcpy on the I/O path — mmap is the page-cache-sharing flavor; sendfile, splice, and io_uring registered buffers attack the same overhead from different angles.
Virtual memory — the substrate that makes any of this possible.
Syscalls are expensive — part of mmap’s win is one syscall up front instead of one per read.

Going deeper

The Linux mmap(2) and madvise(2) man pages are short and worth a full read; most of the gotchas above are quietly documented there.
“Are You Sure You Want to Use MMAP in Your Database Management System?” (Crotty, Leis, Pavlo, CIDR 2022) — the most-cited case against mmap for database storage. Even if you disagree, it’s a clean enumeration of the failure modes.
The llama.cpp source: search for mmap in llama.cpp / ggml.c to see one of the more important real-world consumers of this primitive in modern AI tooling.
LWN articles on the page cache and on transparent huge pages — the Linux-side mechanics that decide what mmap actually feels like in production.