Why networks are big-endian but your CPU is little-endian
Two halves of the same machine disagree on which end of a number comes first. The split is older than you, and it's never going away.
Why it exists
You write 0x12345678 in your code. The CPU stores it. A network card
serializes it. A peer reads it back. Somewhere in that chain, four bytes have
to be laid out in some particular order in memory and on the wire. Nothing in
the math of “a 32-bit integer” tells you which byte goes first.
That ambiguity — which end of a multi-byte number is “first”? — is endianness. And the answer turns out to be: it depends on who you ask, and the two answers we landed on disagree.
- Your x86 laptop, your phone’s ARM core (in its usual mode), and your RISC-V box all store integers little-endian.
- The IP, TCP, UDP headers on every packet you send are big-endian.
So the moment a number crosses from your CPU’s registers to a network packet, somebody has to swap bytes. We’ve been swapping for forty years. Why?
Why it matters now
Endianness is one of those things that’s invisible until you cross a boundary. The boundaries it lives on:
- The network. Every IP/TCP/UDP header field, every DNS message, every
TLS record-length, every gRPC frame is big-endian.
htons(),htonl(),ntohs(),ntohl()exist for exactly this. - File formats. PNG, JPEG, ELF (sort of — ELF carries an endianness flag), TIFF (also a flag), most network-derived protocols-on-disk: big-endian. Most CPU-derived formats (Bitmap, ZIP integers, Mach-O on x86): little.
- Hardware registers. Memory-mapped device registers don’t care what your
CPU thinks; the device picks. Drivers are full of
cpu_to_le32/cpu_to_be32macros for this reason. - Serialization libraries. Protobuf wire format is little-endian for fixed-width fields. CBOR is big-endian. MessagePack is big-endian. The choice often tells you which world the format grew up in.
A bug in this layer doesn’t crash loudly — it produces wrong numbers. A
length field of 4 read as 67_108_864 is the kind of failure that shows up
as “the packet parser hangs forever” rather than “segfault on line 47.” That’s
why the convention exists at all: pick one, write it down, swap if you have
to.
The short answer
endianness = which byte of a multi-byte number lives at the lowest memory address
- Big-endian — most-significant byte first. Reads like a number written
on paper:
0x12345678is stored as12 34 56 78. - Little-endian — least-significant byte first. The same number is stored
as
78 56 34 12.
Both work. Both have defensible arguments. The split between “network byte order is big” and “x86 is little” is a historical accident that calcified into a standard.
How it works
Take the 32-bit number 0x0A0B0C0D. In a register, it’s just 32 bits — no
“order” exists. The order only appears when you ask: what byte is at address
N, address N+1, address N+2, address N+3?
addr +0 +1 +2 +3
big-endian 0A 0B 0C 0D (matches written order)
little-endian 0D 0C 0B 0A (low byte first)
The CPU’s load and store instructions pick one convention and bake it into
the silicon. On x86-64, MOV of a 32-bit word writes those four bytes in
little-endian order, full stop. On a big-endian machine like a classic
PowerPC or a SPARC, the same instruction would write them in the opposite
order. Same number in the register, different bytes in RAM.
Now picture sending those four bytes across a network. The wire is a stream of bytes, no concept of “register.” The receiver picks them up in the order they arrived. If the sender’s CPU disagrees with the receiver’s CPU about which end goes first, every multi-byte field is silently scrambled.
The fix the early internet adopted: pick one order for the wire and make everyone convert. That order, defined in the early TCP/IP RFCs, is big-endian — what we now call “network byte order.” Every host, regardless of native endianness, swaps to big-endian on the way out and back on the way in. On a big-endian host, the swap is a no-op. On a little-endian host (most of them today), it’s a real byte-reverse.
Why big for the network?
The standard account: big-endian is what humans write. When you write
12,345, the most significant digit is leftmost — it’s big-endian
positional notation. Putting bytes on the wire most-significant-first means a
hex dump of a packet looks like the number it represents. For protocol
designers reading network traces by hand in 1981, that mattered a lot.
There’s also a small algorithmic argument: when comparing numbers lexicographically as byte strings, big-endian sorts the same way as the numbers themselves. Useful for some routing tricks, less so today.
Why little for x86?
The standard account here is more mechanical and arguably more interesting. Little-endian has a property that mattered to early hardware designers: reading a smaller integer from the start of a larger one Just Works.
Imagine you stored a 32-bit value 0x0000_00FF in little-endian: bytes
FF 00 00 00. If you load just one byte from that address, you get 0xFF.
Load two, you get 0x00FF. Load all four, you get 0x000000FF. The address
of the value is the same regardless of how wide a load you do — because the
low byte sits at the bottom. Big-endian would put the FF at the end, so
a 1-byte load at the same address gives you 0x00, not 0xFF. You’d have to
adjust the address by the difference in widths.
This made arithmetic carry propagation, pointer truncation between widths, and certain kinds of mixed-width arithmetic a hair simpler in hardware. Intel adopted little-endian for the 8086 in 1978, AMD inherited it for x86-64, ARM made little-endian the default in practice, and RISC-V locked it in by spec. The momentum is now overwhelming.
I don’t have a clean primary source for why exactly Intel picked little-endian for the 8086 — the often-repeated story is the mixed-width-load argument above, but I haven’t seen a contemporary Intel design document confirming that was the deciding factor versus, say, compatibility with the earlier 8080 and its accumulator conventions. Take the “why” with a grain of salt; the “what” is solid.
Bi-endian and the awkward middle
Some architectures — older ARM, older MIPS, IA-64, PowerPC — are bi-endian: a configuration bit selects which mode the CPU runs in. In practice almost everyone configures these as little-endian today, because that’s where the software ecosystem is. The mode bit is a relic.
Then there’s mixed-endian (“middle-endian”), which used to exist on the PDP-11 for 32-bit words and sometimes shows up in network protocols where one field is little-endian and another is big-endian in the same struct. It’s universally regarded as a footgun. SMB/CIFS is the canonical horror story — it grew up on little-endian machines but inherits big-endian fields from older protocols, and the result is a packet format where you have to remember, field by field, which way to byte-swap.
The seams
- Endianness is invisible inside one machine. If your program never
serializes, never reads files written by other architectures, and never
punning a
uint32_tto auint8_t[4], you will never notice. The bug surface is entirely at boundaries. htonlis “host to network long” — and on x86 it’s a byte-reverse instruction. Modern x86 hasBSWAPand ARM hasREV; what was once a loop is now a single cycle. Cheap.- Floats have endianness too. IEEE 754 doesn’t specify byte order; the CPU does. Almost always the same as integer endianness on the same machine, but “almost” has bitten people writing cross-platform save files.
- UTF-16 carries a BOM for exactly this reason. A 2-byte BOM at the start of the stream tells you whether to read it as UTF-16-LE or UTF-16-BE. UTF-8 doesn’t need one because its code units are single bytes.
- Big-endian survives where it had a head start. Java’s
DataOutputStreamis big-endian. JVM bytecode is big-endian. Most Internet RFCs are big-endian. None of these are going to change.
The deeper observation: endianness is a coordination problem the industry solved twice — once for hosts (little wins), once for the wire (big wins) — and never reconciled. It’s cheap enough to swap that nobody has to.
Famous related terms
- Network byte order —
network byte order = big-endian, by RFC— the convention every internet protocol header obeys. htonl/ntohl—htonl = "if I'm little-endian, byte-reverse; else no-op"— the portable way to write code that doesn’t care.- BOM —
BOM = magic prefix that says which endianness this UTF-16 stream is— endianness as a runtime question instead of a compile-time one. - Bi-endian CPU —
bi-endian = same silicon + a mode bit— almost always set to little today. __builtin_bswap32—bswap = single CPU instruction that reverses byte order— what your standard library actually compileshtonldown to.
Going deeper
- Danny Cohen, On Holy Wars and a Plea for Peace (IEN 137, 1980) — the paper that named the camps, after Swift’s Lilliputians fighting over which end of a boiled egg to crack.
- RFC 1700 and earlier RFCs — define “network byte order” as big-endian for the IP suite.
- Linux:
include/uapi/linux/byteorder/— the kernel’s compile-time endianness machinery, including thecpu_to_be32/cpu_to_le32families.