Heads up: posts on this site are drafted by Claude and fact-checked by Codex. Both can still get things wrong — read with care and verify anything load-bearing before relying on it.
why how

Why do embeddings exist?

Computers want numbers, but you also want 'cat' and 'kitten' to live next to each other. Embeddings are the trick that makes both true at once.

AI & ML intro Apr 29, 2026

Why it exists

A neural network can’t read the word “cat.” It can only multiply numbers by other numbers. So at some point, every piece of text, every image, every product in your catalog has to become a list of floats. The boring way to do that — assign each word an arbitrary integer ID — works for indexing but throws away every interesting fact about the word. In an ID-based world, “cat” and “kitten” are as far apart as “cat” and “thermodynamics.” That’s a tragedy if you wanted to build search, recommendations, deduplication, or anything that cares about meaning.

Embeddings exist to fix that. The goal is a representation that satisfies two demands simultaneously:

  1. It’s a vector of numbers, so a model (or a database) can do math on it.
  2. Similar things land near each other in that vector space, so distance becomes a proxy for “are these alike?”

The second demand is the magical one. Once your representation has it, suddenly an enormous family of problems — find documents about this question, suggest songs like this song, cluster these support tickets, flag these two reviews as near-duplicates — collapses into one operation: nearest-neighbor lookup in a vector space.

The earliest hint that this was even possible came from distributional semantics: if “cat” and “kitten” show up around the same other words, maybe “cat” and “kitten” mean something similar. Decades of work turned that hint into a tool, and at some point along the way the tool got a name — embedding — and quietly became one of the load-bearing pieces of modern software.

Why it matters now

If you build anything on top of LLMs in 2026, you’re using embeddings whether or not you call them that.

The practical reason engineers care: an embedding model plus a vector store gets you 80% of “AI features” without ever fine-tuning anything. It’s the cheapest, most reliable AI lever in the toolbox, and it pre- dates LLMs by a decade.

The short answer

embedding = a learned function (text | image | thing) → ℝᵈ such that semantic similarity ≈ vector similarity

An embedding is a fixed-length list of floats — usually a few hundred to a few thousand dimensions — produced by a model that was trained so that “things that mean similar stuff” come out as “vectors that point in similar directions.” Once you have that, “are these two things alike?” becomes a dot product.

How it works

Two questions to keep separate: how do you build the function, and how do you use the vectors once you have them?

Building the function

The training trick has the same shape across most embedding models, even when the details differ wildly:

Pull together representations of things that should be similar. Push apart representations of things that shouldn’t.

That’s it. The art is in defining “should” and “shouldn’t” without humans labeling every pair.

A few canonical recipes:

You almost never train your own embedding model. You pick one off the shelf, sized to your needs and your budget, and use it as a black box.

Using the vectors

Once your data is embedded, three operations cover most of what you’ll do:

similarity(a, b) = cosine(a, b)        # how alike are these two?
search(q, corpus) = top_k by cosine    # which are most like q?
cluster(corpus)   = k-means / HDBSCAN  # group by proximity

A worked sketch of the RAG case:

# offline, once
for chunk in docs:
    store(id=chunk.id, vector=embed(chunk.text), payload=chunk)

# online, per question
q_vec    = embed(user_question)
hits     = vector_db.search(q_vec, top_k=8)
prompt   = SYSTEM + format(hits) + user_question
answer   = llm(prompt)

The LLM looks like the smart part. The embedding step is what made the right eight chunks land in the prompt in the first place.

A few things that surprise people the first time:

The reason it all works at all is the second-most-important fact in modern ML, after “scale helps”: when you train a big enough model on a big enough corpus with a sensible “similar things should be near each other” objective, the geometry that falls out is useful far beyond what the objective explicitly asked for. You trained on next-word prediction or contrastive pairs and you got, almost as a gift, a coordinate system where “find documents like this one” is a one-line operation. That gift is the entire reason embeddings became infrastructure.

Going deeper

A note on what I’m sure of: the high-level story (objectives, use cases, the contrastive shape, the “vectors point similar directions for similar things” property) is well-established. Specific benchmark numbers and “best embedding model right now” rankings change every few months — treat any such claim as something to verify against a current leaderboard rather than memorize.