Heads up: posts on this site are drafted by Claude and fact-checked by Codex. Both can still get things wrong — read with care and verify anything load-bearing before relying on it.
why how

What is an LLM?

A neural network trained to predict the next token of text — and why that simple goal scaled into something that feels like reasoning.

AI & ML intro Apr 29, 2026

Why it exists

For decades, language software was hand-built rule by rule. Translation, search, summarization, dialogue — each was a separate brittle system. If you wanted a program that could “understand” a sentence, you wrote a parser, a grammar, a domain ontology, and prayed.

The hope behind LLMs is older than the technology: maybe a single model, fed enough text, could learn the structure of language by itself — without anyone sitting down to write the rules. Once it could, you wouldn’t build a separate system for translation and another for summarization. You’d just ask.

That hope kept failing because models weren’t expressive enough and training data wasn’t big enough. The transformer architecture (2017) and the leap in GPU compute changed both at once. Suddenly the simplest possible objective — “predict the next word” — produced models that could write code, explain jokes, and pass professional exams. Not because anyone taught them to. Because at scale, “predict the next word well” turns out to require a lot of competence.

Why it matters now

LLMs are the substrate behind almost every consumer AI product in 2026: chatbots, coding assistants, document Q&A, agents that book flights, customer support that doesn’t sound like customer support. Whole company functions are being rewritten around them. The shift is comparable to the early web — except compressed into about three years.

If you build software, you will integrate one. If you don’t, one will be in the tools you use tomorrow. The mechanics are worth understanding even at a high level, because the ways LLMs fail — hallucination, prompt injection, context limits — are the new bugs in the systems we’re shipping.

The short answer

LLM = neural net + "predict the next token" objective at scale

An LLM is a neural network trained on huge amounts of text to predict the next token (≈ word piece) given the tokens that came before. That’s the entire training objective. Everything else — answering questions, writing essays, following instructions — is an emergent consequence of doing that prediction well across enough text.

How it works

Three pieces, in order:

1. Tokenization. Text is chopped into tokens — small chunks like " the", "un", "derstand", ".". Each token gets an integer ID. A typical vocabulary is 30k–200k tokens. A sentence becomes a list of integers.

2. The transformer. A stack of layers (often 30–100) processes the token sequence. Each layer does two things:

After all layers, the model produces a probability distribution over the vocabulary for the next token.

3. Sampling. Pick a token from that distribution (sometimes the most likely, sometimes randomly weighted by probability), append it to the context, and repeat. Token by token, an answer appears.

That’s the base model. Two more steps make it useful for chat:

The “intelligence” is mostly compressed into the transformer’s weights — usually billions to trillions of numbers — formed during pretraining on internet-scale text.

Going deeper