Why monotonic time is different from wall-clock time
Wall-clock time tells you what to put on a calendar. Monotonic time tells you how long something took. Confusing them is how you get bugs that look like physics violations.
Why it exists
You write the most natural piece of code in the world:
start = time.time()
do_the_thing()
elapsed = time.time() - start
Most days this works. Then one night, do_the_thing() “takes” negative
twenty-three minutes. Or it appears to take three hours when it actually took
four seconds. Nothing about the function changed. What changed is that the
machine’s clock got adjusted underneath you — by
NTP
nudging the wall clock backwards to correct drift, by a daylight-saving
transition, by a virtual machine pausing and resuming, by an admin running
date -s to fix a wrong clock, or by a leap second smearing.
The fix isn’t “use a better clock.” It’s recognising that there are two different questions you might be asking, and they need two different clocks:
- What time is it right now, in the human world? That’s wall-clock time: it must agree with calendars, time zones, NTP, and ultimately the rotation of the Earth. It is allowed — required, even — to jump.
- How long has it been since X? That’s monotonic time: it must never go backwards, never jump, never be “corrected.” It doesn’t even need to be a meaningful date — it just needs to count seconds since some arbitrary fixed point.
Modern operating systems expose both because trying to make one clock do both jobs is a contradiction. You can’t simultaneously promise “this matches civil time” and “this never goes backwards” — civil time itself goes backwards sometimes.
Why it matters now
Engineers in the AI era are running more code on machines they don’t own:
ephemeral cloud VMs, autoscaling pods, GPU boxes spun up for one training
run. Those machines come up with bad clocks and have NTP “yank” them into
shape minutes later. Anything that timed itself with time.time() during
that window measured nonsense.
It also matters for everything where “elapsed time” is load-bearing:
- Timeouts (HTTP, database, RPC). A negative elapsed time can make a timeout fire instantly or never fire at all.
- Rate limiters and token buckets. If “now” goes backwards, you can grant the same budget twice or refuse legitimate traffic for hours.
- Retries with exponential backoff.
- Distributed traces and benchmarks where you subtract two timestamps to get a duration.
- Cache TTLs measured locally on a node.
The bugs are nasty because they’re rare, system-dependent, and they look like the laws of physics broke.
The short answer
monotonic time = "seconds since some fixed point" + "never goes backwards, never jumps"
Wall-clock time answers “what time is it?”. Monotonic time answers “how much time has passed?”. They are different problems, so the OS gives you different clocks. Use wall-clock for things humans see (logs, scheduled jobs, “created_at”). Use monotonic for every duration, deadline, and timeout your code computes.
How it works
Underneath, both clocks are usually derived from the same hardware source — on modern x86, typically the TSC or an equivalent counter exposed through vDSO. The difference is what the OS does with that counter before handing you a number.
For wall-clock time, the kernel keeps an offset from the counter to civil
time, and ntpd/chrony/systemd-timesyncd continuously adjust that
offset. There are two adjustment modes:
- Step — overwrite the clock instantly. Big jumps. This is what
clock_settimeanddate -sdo, and what NTP does on a fresh boot when it’s wildly wrong. - Slew — speed the clock up or slow it down by a tiny percentage so it converges on the correct time over minutes. This avoids backward jumps but means a “second” of wall-clock time is not always a real physical second.
For monotonic time, the kernel exposes the counter directly with no offset
games. On Linux that’s clock_gettime(CLOCK_MONOTONIC, …). There are
several flavours:
CLOCK_MONOTONIC— never jumps, but can be slewed (counts a little slower or faster while NTP corrects).CLOCK_MONOTONIC_RAW— also never jumps, and is not slewed. Closer to raw hardware ticks. Useful when you care about the actual physical duration, not “seconds as the kernel currently defines them.”CLOCK_BOOTTIME— likeCLOCK_MONOTONICbut also includes time the machine spent suspended. Most “wall-clock-but-safe” choices end up reaching for this.
In application languages:
- Go’s
time.Now()carries both a wall and a monotonic reading in the same value, andt2.Sub(t1)uses the monotonic part automatically. This is one of the cleaner designs in the wild. - Python has
time.monotonic()andtime.perf_counter()separate fromtime.time(). Use the former two for durations. - Java has
System.nanoTime()(monotonic-ish) andSystem.currentTimeMillis()(wall-clock). The Javadoc explicitly warns against subtracting twocurrentTimeMillis()values to measure elapsed time. - JavaScript has
performance.now()(monotonic, ms-resolution-ish) andDate.now()(wall-clock).
The pattern across all of them is the same: there are two functions because there are two questions.
Show the seams
A few things the textbook account skips:
- Monotonic clocks are only monotonic per process (or per machine). You cannot meaningfully compare a monotonic timestamp from one box to one from another — the “fixed point” is arbitrary and different on each machine. Cross-machine timing has to use wall-clock plus careful sync, or a logical clock.
- Suspend/resume is the trap.
CLOCK_MONOTONICon Linux historically did not advance while the system was suspended. So a laptop that slept for an hour would see “0 seconds” elapsed onCLOCK_MONOTONICbetween sleep and wake.CLOCK_BOOTTIMEwas added precisely to fix this. Whether your language’smonotonic()ticks during suspend depends on which clock the runtime picked, and the answer is not always documented. - Leap seconds are messy. When a leap second is inserted, civil time has to absorb an extra second somewhere. Different systems handle this differently: some step backwards by one second, some “smear” it across hours so each second is fractionally longer, some pretend it didn’t happen. Monotonic clocks ignore leap seconds entirely, which is another reason to use them for durations.
- Virtual machines lie convincingly. A VM that gets paused by the hypervisor for 200 ms and then resumed will, depending on configuration, see either a 200 ms gap on its monotonic clock or no gap at all. There isn’t a universally correct answer, which is why benchmarks inside VMs need to be read with care.
- “Time goes backwards” is not just NTP’s fault. Threads on different
cores reading the TSC can see slightly inconsistent values if the TSCs
aren’t perfectly synchronised. The kernel papers over this for
CLOCK_MONOTONIC, but I don’t have a confident summary of every edge case across every CPU/OS combination — the safe assumption is that monotonic within a thread is rock solid and across threads is almost always fine but worth checking if you’re measuring microseconds.
The deeper point: “what time is it” and “how long did this take” are not the same question, and the universe doesn’t owe us a single clock that answers both. The split into two clocks is the OS being honest about that.
Famous related terms
- NTP —
NTP = "ask a time server what time it is" + "slew or step the local clock toward that"— the source of most wall-clock adjustments. - Leap second —
leap second ≈ "an extra second inserted into UTC to keep it aligned with Earth's rotation"— the reason wall-clock time can repeat or skip. - Logical clock / Lamport timestamp —
Lamport clock = counter + "I saw a message from you, bump mine past yours"— what you reach for when neither wall-clock nor monotonic is enough, e.g. ordering events across machines. - TSC —
TSC = CPU register + "ticks every cycle"— the hardware most monotonic clocks ultimately read. - vDSO —
vDSO ≈ "syscalls without the syscall"— whyclock_gettimeis fast enough to call in tight loops.
Going deeper
man 2 clock_gettime— the canonical list of Linux clocks and what each one promises.- Go’s blog post on monotonic time in
time.Time(the “monotonic clocks” section of thetimepackage docs) — a good worked example of bolting monotonic semantics onto an existing API without breaking it. - Google’s “leap smear” writeup — for a real-world account of why industry mostly chose to spread leap seconds out rather than honor them literally.