What is a hash table?
A data structure that lets you find, insert, and delete by key in roughly constant time — the workhorse behind dictionaries, sets, and most fast lookup in modern code.
Why it exists
Imagine a phone book with a million names, and you need to find one. If the book is unsorted, you read every entry — a million comparisons in the worst case. If it’s sorted, binary search drops that to ~20. Better, but still grows with the size of the book.
The deeper question: can lookup be independent of the size of the collection? Can finding “Phi” in a billion-entry book be just as fast as in a ten-entry book?
The hash table says yes — at the cost of giving up order. If we can compute,
directly from the key itself, the slot where its value lives, we don’t have
to search at all. We jump straight there. That’s the entire idea, and it’s why
hash tables are everywhere: every language’s dict, map, Set, object,
the symbol table inside your compiler, the routing table inside your database.
Why it matters now
It’s the most-used non-trivial data structure in the world. When you write
users[id] in Python, cache.get(key) in Go, Map in JavaScript — that’s a
hash table. Database indexes, language interpreters, CDN caches, deduplication,
counting unique visitors — all leaning on the same trick. Understanding it pays
off every time you reason about performance, collisions, or why your “fast
lookup” suddenly isn’t fast.
The short answer
hash table = array + hash function
A hash table stores key-value pairs in an array, using a hash function to
turn each key into an array index. To find a value, hash the key, jump to that
index, done. Average-case O(1) for insert, lookup, and delete.
How it works
Three pieces:
1. The array (often called the “table” or “buckets”). Fixed size to start — say, 16 slots.
2. The hash function. Takes a key (string, number, anything) and returns an integer. Then we take that integer mod the table size to get a slot index. A good hash function spreads keys evenly across slots so they don’t pile up.
slot = hash(key) % table_size
3. Collision handling. Two different keys can hash to the same slot. Two common fixes:
- Chaining — each slot holds a small list of entries; lookup walks the list.
- Open addressing — if the slot is taken, probe to the next slot (linear, quadratic, or by another hash).
When the table gets too full (typical threshold: ~70% load factor), it
resizes — allocates a bigger array, rehashes every entry into it. Resize is
expensive, but it’s amortized across many cheap operations, so the average
cost per operation stays O(1).
A worked example with chaining and a 4-slot table:
hash("apple") % 4 = 1 → bucket 1: [(apple, 5)]
hash("banana") % 4 = 3 → bucket 3: [(banana, 2)]
hash("cherry") % 4 = 1 → bucket 1: [(apple, 5), (cherry, 9)]
↑ collision, append to chain
Looking up cherry: hash to slot 1, walk the chain, return 9.
The catch: that O(1) is average case. Worst case — every key happens to
hash to the same slot — is O(n). Adversarial inputs can deliberately trigger
this (a real attack vector, called hash flooding), which is why most modern
hash maps use a randomized seed per process.
Famous related terms
- Big-O notation —
Big-O = how cost grows with input size, ignoring constants— the language we use to say “constant time” precisely. - Hash function —
hash function = deterministic map from arbitrary bytes → fixed-size fingerprint— the math that turns a key into an index. Good ones look random; bad ones cluster and ruin performance. - Load factor —
load factor = entries ÷ slots— too high → slow. Triggers resize. - Open addressing vs. chaining —
collision strategy = probe to next slot OR keep a list per slot— the two ways to resolve hash collisions. - Balanced tree —
balanced tree ≈ sorted dictionary with O(log n) operations— the alternative data structure when you need keys in sorted order (e.g. range queries).O(log n)instead ofO(1), but ordered. - Bloom filter —
bloom filter = bit array + k hash functions— a hash-table cousin that answers “have I seen this?” in much less memory, at the cost of false positives.
Going deeper
- Introduction to Algorithms (CLRS), chapter on hash tables — the canonical treatment.
- “Hash flooding DoS attacks” — search this for a real-world reminder that the worst case matters.
- Source code of any standard library
HashMap(Java’s, Rust’sstd::HashMap, Go’smap) — surprisingly readable and full of practical tricks.