Heads up: posts on this site are drafted by Claude and fact-checked by Codex. Both can still get things wrong — read with care and verify anything load-bearing before relying on it.
why how

What is a hash table?

A data structure that lets you find, insert, and delete by key in roughly constant time — the workhorse behind dictionaries, sets, and most fast lookup in modern code.

Computer Science intro Apr 29, 2026

Why it exists

Imagine a phone book with a million names, and you need to find one. If the book is unsorted, you read every entry — a million comparisons in the worst case. If it’s sorted, binary search drops that to ~20. Better, but still grows with the size of the book.

The deeper question: can lookup be independent of the size of the collection? Can finding “Phi” in a billion-entry book be just as fast as in a ten-entry book?

The hash table says yes — at the cost of giving up order. If we can compute, directly from the key itself, the slot where its value lives, we don’t have to search at all. We jump straight there. That’s the entire idea, and it’s why hash tables are everywhere: every language’s dict, map, Set, object, the symbol table inside your compiler, the routing table inside your database.

Why it matters now

It’s the most-used non-trivial data structure in the world. When you write users[id] in Python, cache.get(key) in Go, Map in JavaScript — that’s a hash table. Database indexes, language interpreters, CDN caches, deduplication, counting unique visitors — all leaning on the same trick. Understanding it pays off every time you reason about performance, collisions, or why your “fast lookup” suddenly isn’t fast.

The short answer

hash table = array + hash function

A hash table stores key-value pairs in an array, using a hash function to turn each key into an array index. To find a value, hash the key, jump to that index, done. Average-case O(1) for insert, lookup, and delete.

How it works

Three pieces:

1. The array (often called the “table” or “buckets”). Fixed size to start — say, 16 slots.

2. The hash function. Takes a key (string, number, anything) and returns an integer. Then we take that integer mod the table size to get a slot index. A good hash function spreads keys evenly across slots so they don’t pile up.

slot = hash(key) % table_size

3. Collision handling. Two different keys can hash to the same slot. Two common fixes:

When the table gets too full (typical threshold: ~70% load factor), it resizes — allocates a bigger array, rehashes every entry into it. Resize is expensive, but it’s amortized across many cheap operations, so the average cost per operation stays O(1).

A worked example with chaining and a 4-slot table:

hash("apple") % 4 = 1   →  bucket 1: [(apple, 5)]
hash("banana") % 4 = 3  →  bucket 3: [(banana, 2)]
hash("cherry") % 4 = 1  →  bucket 1: [(apple, 5), (cherry, 9)]
                                  ↑ collision, append to chain

Looking up cherry: hash to slot 1, walk the chain, return 9.

The catch: that O(1) is average case. Worst case — every key happens to hash to the same slot — is O(n). Adversarial inputs can deliberately trigger this (a real attack vector, called hash flooding), which is why most modern hash maps use a randomized seed per process.

Going deeper