What is a hash table?

A data structure that lets you find, insert, and delete by key in roughly constant time — the workhorse behind dictionaries, sets, and most fast lookup in modern code.

Computer Science intro Apr 29, 2026

Why it exists

Imagine a phone book with a million names, and you need to find one. If the book is unsorted, you read every entry — a million comparisons in the worst case. If it’s sorted, binary search drops that to ~20. Better, but still grows with the size of the book.

The deeper question: can lookup be independent of the size of the collection? Can finding “Phi” in a billion-entry book be just as fast as in a ten-entry book?

The hash table says yes — at the cost of giving up order. If we can compute, directly from the key itself, the slot where its value lives, we don’t have to search at all. We jump straight there. That’s the entire idea, and it’s why hash tables are everywhere: every language’s dict, map, Set, object, the symbol table inside your compiler, the routing table inside your database.

Why it matters now

It’s the most-used non-trivial data structure in the world. When you write users[id] in Python, cache.get(key) in Go, Map in JavaScript — that’s a hash table. Database indexes, language interpreters, CDN caches, deduplication, counting unique visitors — all leaning on the same trick. Understanding it pays off every time you reason about performance, collisions, or why your “fast lookup” suddenly isn’t fast.

The short answer

hash table = array + hash function

A hash table stores key-value pairs in an array, using a hash function to turn each key into an array index. To find a value, hash the key, jump to that index, done. Average-case O(1) for insert, lookup, and delete.

How it works

Three pieces:

1. The array (often called the “table” or “buckets”). Fixed size to start — say, 16 slots.

2. The hash function. Takes a key (string, number, anything) and returns an integer. Then we take that integer mod the table size to get a slot index. A good hash function spreads keys evenly across slots so they don’t pile up.

slot = hash(key) % table_size

3. Collision handling. Two different keys can hash to the same slot. Two common fixes:

Chaining — each slot holds a small list of entries; lookup walks the list.
Open addressing — if the slot is taken, probe to the next slot (linear, quadratic, or by another hash).

When the table gets too full (typical threshold: ~70% load factor), it resizes — allocates a bigger array, rehashes every entry into it. Resize is expensive, but it’s amortized across many cheap operations, so the average cost per operation stays O(1).

A worked example with chaining and a 4-slot table:

hash("apple") % 4 = 1   →  bucket 1: [(apple, 5)]
hash("banana") % 4 = 3  →  bucket 3: [(banana, 2)]
hash("cherry") % 4 = 1  →  bucket 1: [(apple, 5), (cherry, 9)]
                                  ↑ collision, append to chain

Looking up cherry: hash to slot 1, walk the chain, return 9.

The catch: that O(1) is average case. Worst case — every key happens to hash to the same slot — is O(n). Adversarial inputs can deliberately trigger this (a real attack vector, called hash flooding), which is why most modern hash maps use a randomized seed per process.

Big-O notation — Big-O = how cost grows with input size, ignoring constants — the language we use to say “constant time” precisely.
Hash function — hash function = deterministic map from arbitrary bytes → fixed-size fingerprint — the math that turns a key into an index. Good ones look random; bad ones cluster and ruin performance.
Load factor — load factor = entries ÷ slots — too high → slow. Triggers resize.
Open addressing vs. chaining — collision strategy = probe to next slot OR keep a list per slot — the two ways to resolve hash collisions.
Balanced tree — balanced tree ≈ sorted dictionary with O(log n) operations — the alternative data structure when you need keys in sorted order (e.g. range queries). O(log n) instead of O(1), but ordered.
Bloom filter — bloom filter = bit array + k hash functions — a hash-table cousin that answers “have I seen this?” in much less memory, at the cost of false positives.

Going deeper

Introduction to Algorithms (CLRS), chapter on hash tables — the canonical treatment.
“Hash flooding DoS attacks” — search this for a real-world reminder that the worst case matters.
Source code of any standard library HashMap (Java’s, Rust’s std::HashMap, Go’s map) — surprisingly readable and full of practical tricks.