Heads up: posts on this site are drafted by Claude and fact-checked by Codex. Both can still get things wrong — read with care and verify anything load-bearing before relying on it.
why how

Why syscalls are expensive

A function call costs a few cycles. A system call costs hundreds — sometimes thousands. The gap isn't sloppy engineering; it's the price of the user/kernel boundary.

Systems intermediate Apr 29, 2026

Why it exists

A normal function call is like asking your friend sitting next to you to pass the salt — quick, no fuss, a couple of words and you’re done. A syscall is like asking airport security to fetch something from your checked bag. They have to check your ID, scan you, decide whether you’re allowed, get the item, escort you back, and re-lock the door. The actual task might take a millisecond; the security ritual takes the rest of the time. Every time your program reads a file, opens a network socket, or even asks the current time, it’s doing the airport- security version of a function call. Do it ten times a second and nobody cares. Do it a million times a second and your program spends most of its life waiting at the checkpoint.

You write read(fd, buf, n) and it looks like any other function call. It isn’t. A normal function call is a handful of cycles — push some registers, jump, pop, return. A syscall is, on a modern x86-64 box, on the order of hundreds of nanoseconds in the best case, and often more once you count the indirect costs that hit after the call returns. That’s a 50–200× gap, and it’s not because kernel programmers are bad at their jobs.

The gap exists because a syscall is not really a function call at all. It’s a controlled crossing of a hardware-enforced security boundary. The CPU is in one of two modes — user mode or kernel mode — and a huge amount of machinery exists to stop user code from forging its way into kernel mode. A syscall is the one sanctioned door, and going through it costs what the door costs.

The pain points the boundary is solving:

All of that costs cycles. The “expense” is the integrity tax.

Why it matters now

It shows up the moment your workload is bottlenecked on lots of small I/O:

The AI-era version: when people batch tokenization, batch GPU launches, or use shared-memory IPC instead of sockets, “amortize the syscall” is usually in the unstated reasons. It’s also why the cost of a single syscall is a useful unit of “how much work is this small operation actually worth doing.”

The short answer

syscall cost = mode switch + register save/restore + KPTI page-table swap + cache & TLB & branch-predictor pollution

A function call stays in user mode and shares everything with the caller. A syscall flips the CPU into kernel mode, switches address-space context (on post-Meltdown kernels), saves a pile of state, and disturbs CPU caches the user code was depending on. You pay all of that twice — once going in, once coming out.

How it works

The hardware crossing

On x86-64, the user-space syscall path is the syscall instruction. It does roughly this in microcode:

  1. Save the user RIP (instruction pointer) and RFLAGS into specific registers.
  2. Load the kernel entry point from a CPU model-specific register (MSR_LSTAR).
  3. Switch the CPU’s CPL from ring 3 (user) to ring 0 (kernel).
  4. Mask interrupts according to MSR_SFMASK.

That much is cheap-ish — tens of cycles. The expensive part is what software has to do next.

What the kernel has to do on entry

The kernel can’t trust a single thing about the CPU state it just inherited. So the entry stub:

Why it’s still expensive even after the cycles

The direct cycle count is only part of it. The real bill is the microarchitectural collateral damage:

So a “200 ns syscall” is really “120 ns of direct work plus another few hundred nanoseconds of cold-cache pain that you’ll feel as your user code runs slowly for a little while after.” The second number doesn’t show up in benchmarks that measure the syscall in isolation, which is part of why syscall cost is consistently underestimated.

Show the seams

The deep idea is the same as virtual memory: the OS doesn’t actually have power over user code except at moments when the hardware hands control over. The syscall is one of those moments, deliberately constructed. Everything expensive about it is the cost of making that handover safe.

Going deeper