How end-to-end encryption works
Open WhatsApp and a banner tells you Meta can't read your messages. That claim sits on a specific protocol — Diffie–Hellman key agreement plus a 'double ratchet' that changes the key on every message. Here's the shape of it.
Why it exists
Open WhatsApp, start a chat with a friend, and a small banner tells you “Messages and calls are end-to-end encrypted.” Meta runs the servers your message just travelled through — and the banner is claiming that even Meta, sitting in the middle, cannot read what you just typed.
The obvious question is: how is that different from the lock icon next to a URL in your browser? Both say “encrypted.” The difference is where the ciphertext stops. With ordinary TLS, your message is encrypted from your phone to WhatsApp’s server, decrypted there, then re-encrypted from the server to your friend’s phone. The server holds plaintext, even if only for a moment. Anyone who can subpoena, breach, or rogue-employee that server can read messages. End-to-end encryption (E2EE) closes that window: the keys live on the two endpoints, the server only relays ciphertext, and “trust the server operator” stops being part of the threat model.
flowchart LR
subgraph TLS["Ordinary TLS"]
direction LR
A1[Alice's phone] -->|ciphertext| S1["server<br/>holds plaintext briefly"]
S1 -->|ciphertext| B1[Bob's phone]
end
subgraph E2EE["End-to-end encryption"]
direction LR
A2[Alice's phone] -->|ciphertext| S2["server<br/>only sees ciphertext"]
S2 -->|ciphertext| B2[Bob's phone]
end
Why it matters now
E2EE is the live political fight in security right now, not a settled feature. The UK’s Online Safety Act and the EU’s recurring “Chat Control” proposals have both, in different forms, pushed for client-side scanning — inspecting messages on the device before they’re encrypted — which Signal has publicly said it would exit the UK market over rather than implement. Apple announced an iCloud-side CSAM-scanning proposal in 2021 and quietly retired it by late 2022; I don’t have a confident primary source tying the retirement directly to the same E2EE arguments, only the obvious overlap.
On the deployment side, what Signal launched as its protocol in 2013–2014 is now the default for billions of conversations: WhatsApp turned Signal-protocol E2EE on by default in 2016, Google Messages rolled out one-to-one RCS E2EE to beta in late 2020 and made it the default for one-to-one Messages chats by the end of 2022, and Facebook Messenger finished defaulting all personal chats to E2EE in December 2023. The mechanism this post describes underlies the major consumer-messaging clients — I don’t have a clean number for total share of consumer messaging traffic worldwide — and is the part legislation keeps trying to put a hole in.
The short answer
E2EE = key agreement between the two devices + a per-message symmetric key that the server never sees
Two phones do a public-key handshake to agree on a starting secret without trusting the server in the middle. From that secret they derive a fresh symmetric key for every message, ratcheting forward so that compromising today’s key doesn’t reveal the messages already sent — and a periodic Diffie–Hellman step folds in new entropy so future messages eventually recover from the compromise too. The server’s job shrinks to “store and forward opaque blobs.”
How it works
We’ll use the Signal Protocol as the worked example, because it’s the one WhatsApp, Signal, Messenger, and Google Messages all build on, and because it’s been publicly specified and analysed since around 2013. There are three moving parts.
1. Bootstrapping a shared secret (X3DH)
The hard problem is that when Alice first messages Bob, Bob might be offline. You can’t do an interactive handshake with a phone that’s currently in a pocket. Signal’s answer is X3DH: every user uploads a small bundle of public keys to the server in advance — a long-term identity key, a medium-lived “signed prekey,” and a stack of one-time prekeys. When Alice wants to start a conversation, she fetches Bob’s bundle, performs several Diffie–Hellman computations against those keys at once, and mixes the results into a single shared secret. Bob, when he comes online, does the matching operations on his side and arrives at the same secret.
The server hosts the bundles but never sees the resulting secret — Diffie–Hellman is built so the secret is computable from each side’s private keys plus the other side’s public keys, and never has to traverse the network.
sequenceDiagram
participant A as Alice's phone
participant S as Server (Meta / Signal)
participant B as Bob's phone
B->>S: upload identity + prekey bundle
A->>S: fetch Bob's prekey bundle
A->>A: derive shared secret (X3DH)
A->>S: first ciphertext + handshake material
S->>B: deliver when online
B->>B: derive same shared secret
2. Ratcheting forward (Double Ratchet)
A single shared secret would be a fragile thing to keep using for years. If a phone is compromised tomorrow, you want yesterday’s messages to stay unreadable (forward secrecy) and you want the conversation to eventually recover so tomorrow’s stop being readable too (post-compromise security).
The Double Ratchet gives both. It combines two clocks:
- A symmetric ratchet that runs a KDF on the current chain key after every single message. The chain key for message N is unrecoverable from the chain key for message N+1. So even if an attacker steals the device right now, they can’t decrypt the messages already sent — those keys have been overwritten.
- A Diffie–Hellman ratchet that carries the sender’s current ratchet public key on every message. When the other side sees a new ratchet key (typically once they reply), both sides do a fresh DH computation, mix it into the chain, and restart from new entropy neither endpoint knew before. That’s the part that gives post-compromise recovery: a freshly stolen key gets washed out within one round-trip — not on every message, but on every direction-reversal.
sequenceDiagram
participant A as Alice's phone
participant B as Bob's phone
Note over A,B: DH step → chain key C₀
A->>B: msg 1 (key from C₀)
A->>B: msg 2 (key from C₁ = KDF(C₀))
A->>B: msg 3 (key from C₂ = KDF(C₁))
Note over A,B: Bob's reply triggers a new DH step → C'₀
B->>A: msg 4 (key from C'₀)
B->>A: msg 5 (key from C'₁ = KDF(C'₀))
Note over A,B: Alice's reply triggers another DH step → C''₀
A->>B: msg 6 (key from C''₀)
The symmetric ratchet is the inside of each run (C₀ → C₁ → C₂ by KDF). The DH ratchet is the jump between runs — a fresh chain key derived from new key material the other side just brought in.
The per-message key that finally encrypts the payload is symmetric (Signal uses AES-256 in CBC mode with an HMAC-SHA256 tag — an encrypt-then-MAC construction rather than a true AEAD like AES-GCM, for historical reasons in the spec). The public-key math only runs during ratchet steps, not on every keystroke.
3. What the server actually sees
Once the ratchet is running, the server’s view of each message is roughly: from device A, to device B, this many bytes, at this time, here is an opaque blob and a header. That blob and header are useless without one of the two endpoints’ keys.
flowchart LR
A[Alice's phone<br/>plaintext + per-msg key] -->|ciphertext + header| S[Server]
S -->|ciphertext + header| B[Bob's phone<br/>per-msg key → plaintext]
Show the seams
- Metadata is not encrypted. The server still has to route the message, so it sees who sent to whom, when, how often, message sizes, and group-membership patterns. That graph alone is famously informative. Signal goes further than most to minimise this (sealed sender, contact-discovery via secure enclaves), but no E2EE messenger hides all metadata.
- Backups are the usual back door. If iCloud or Google Drive stores your chat history in a form the cloud provider can read, the E2EE guarantee ends at the backup. WhatsApp now offers end-to-end encrypted backups, but they’re opt-in; iMessage’s “Advanced Data Protection” is similarly opt-in. The default settings often quietly weaken the property the banner advertises.
- Endpoint compromise breaks everything. E2EE protects the channel, not the device. Malware on the phone, a shoulder-surfer, or a forensic extraction of an unlocked device sees plaintext, because plaintext is what the user reads.
- Key verification is the user-experience cliff. The protocol can tell you when your contact’s key changed (Signal’s “safety number,” WhatsApp’s security-code QR), but in practice almost nobody scans them. A malicious server could insert a fake key for a brand-new conversation, and unless the two humans compare numbers out of band, neither would notice. Key transparency systems (audit logs of which key the server claims belongs to whom) are an active mitigation but not yet universal — I don’t have a confident current deployment number across the major messengers.
- Group chats are harder than two-party chats. Naive E2EE for groups scales as O(group size) work per message. Signal historically used pairwise “sender keys”; the IETF MLS standard (RFC 9420, 2023) is the newer construction designed for large groups. Adoption is partial: I know MLS underlies parts of recent group-messaging deployments but won’t claim specific products without re-checking.
- Client-side scanning is the wedge. If a regulator can force the messenger to run a hash check or AI classifier on the plaintext before encryption, the E2EE property survives technically (the network still sees only ciphertext) while being effectively neutralised (the device itself becomes an informant). Whether that counts as “still E2EE” is the fight under the legislative debates above.
Famous related terms
- Signal Protocol —
Signal Protocol = X3DH + Double Ratchet + authenticated symmetric encryption per message— the spec Signal, WhatsApp, and Google Messages all build on; Messenger uses it alongside its own additions. - Forward secrecy —
forward secrecy = past messages stay safe if today's key leaks— the symmetric-ratchet half of the Double Ratchet exists to deliver exactly this property. - Post-compromise security —
post-compromise security ≈ "self-healing" after key theft once a fresh DH ratchet step runs— the DH-ratchet half. - Diffie–Hellman key exchange —
DH = derive a shared secret over a public channel— the primitive every step of X3DH is built on; see public-key crypto. - TLS —
TLS ≈ encrypt the client↔server hop— protects the network link but not the server itself; the contrast that motivates E2EE. See TLS. - MLS (Messaging Layer Security) —
MLS ≈ E2EE for groups, with a key-tree so per-message cost is logarithmic in group size— the IETF standard (RFC 9420) for the group case. - Client-side scanning —
CSS ≈ run a classifier on plaintext on the device before it's encrypted— the regulatory workaround whose security implications are actively contested. - Sealed sender —
sealed sender ≈ hide the "from" field from the server too— a Signal-specific metadata-minimisation feature.
Going deeper
- The Signal Protocol specifications — X3DH and The Double Ratchet Algorithm, both at signal.org/docs — the primary source: short, readable, and the spec the major messengers build on.
- Matthew Green, “Attack of the Week: Group Messaging in WhatsApp and Signal” — the clearest non-spec explainer of where the Signal-style story stops being clean once you leave two-party chats and have to handle groups.