Heads up: posts on this site are drafted by Claude and fact-checked by Codex. Both can still get things wrong — read with care and verify anything load-bearing before relying on it.
why how

How can I tell when an LLM is making the answer up?

True answers and fabricated ones come out of the same pipe, in the same tone. There's no red light. But there are seams — places hallucinations cluster, shapes they tend to take, tells you can learn to read.

AI & ML intro May 2, 2026

Why it exists

You’re writing an essay. You ask ChatGPT for a citation, it gives you a paper title, two authors, a journal, a year. You paste it into your draft. Hours later, you go to actually open the paper — and it doesn’t exist. The authors are real, the journal is real, the year is plausible. The paper is not. Or you’re vibe-coding and the model writes import pandas as pd; df.read_excel_safe(path) and your interpreter tells you read_excel_safe is not a thing. It looks exactly like a real pandas method. It isn’t one.

That experience — the answer that looked right, sounded right, and turned out to be confidently invented — is the thing this post is about. The hard part is that every answer the model gives looks like an answer. True ones and fabricated ones come out of the same pipe, in the same tone, with the same conviction. There is no little red light that flips on when the model is making it up.

That’s the whole problem. The system that wrote the hallucination — fluently emitting a plausible-looking continuation when it didn’t actually know — is the same system that wrote the answer that turned out to be right. From the surface, they’re indistinguishable.

So the practical question, the one a curious user actually has, isn’t “why do hallucinations exist?” — it’s “given that they do, how do I tell, in the moment, whether the thing in front of me is one?”

The honest answer is: you often can’t, not from the text alone. But there are seams. There are places hallucinations cluster, shapes they tend to take, and tells that come from how the model produces them. None of these are decisive. Together, they’re enough to know when to stop trusting the prose and go check.

Why it matters now

LLMs are no longer used only as toy chatbots — they write code, draft contracts, summarize medical records, propose investment theses. The cost of acting on a confident-but-wrong answer scales with what you’re doing. A fabricated citation in a school essay is embarrassing. A fabricated function name in a deployed agent is a bug. A fabricated drug interaction in a clinical decision support tool is harm.

This is also why “trust but verify” is too generous a frame. The model is not actually trying to deceive you, but the output is fluent enough that trust is the default state without any conscious decision. The job isn’t “decide whether to trust this.” The job is “notice the specific shapes that should make you stop trusting it, and check the ones that matter.”

The short answer

spotting a hallucination ≈ find the checkable claim + check it + read the rest for tells

The checkable claims — names, numbers, citations, code identifiers, dates, quotes — are where fabrication has to land somewhere falsifiable, so they’re where you catch it. The rest of the prose isn’t necessarily true or false in a checkable way; for that, you read for tells.

How it works

Hallucinations cluster in predictable places, take predictable shapes, and ride on a predictable kind of fluency. Here’s where to look.

Where they cluster

Long-tail specifics. The model has seen a lot of text on the most famous topics and very little on the next million. Common, well-documented things — how TCP works, what git rebase does, how photosynthesis works — are lower-risk, though not safe. The further down the long tail you go (the third author on a 2014 paper, an obscure library’s API, the population of a small town in 1973), the higher your prior for fabrication should be.

Anything past the training cutoff, asked closed-book. Versions, releases, current events, prices, who runs what company today. If the model has no tool/search/retrieval available, the prompt still looks like a question it should be able to answer, and a confident guess is a common failure mode. (When the product does have search or RAG attached, this risk is partly absorbed there — see below.)

Citations and quotations. This is the canonical hallucination shape: a paper title that sounds right, plausible authors, a journal that exists, a year that fits, and the contents wrong. Variants include real authors with the wrong title, real titles with the wrong year, real papers cited for claims they don’t make. Direct quotes attributed to specific people are similar. The model has learned the shape of a citation extremely well; the shape doesn’t constrain the contents.

Code that calls things by name. Function names, library imports, command-line flags, API endpoints. The model knows what kind of name should appear in this position and produces something that fits the pattern. Whether that name actually exists in the library is a different question.

Numbers with too many digits. “The 2019 study showed a 37.4% improvement” should make you more suspicious than “the study showed roughly a third improvement.” Specificity is cheap to fabricate and feels authoritative.

Internal, organizational, or proprietary detail. Undocumented APIs, recent policy changes, who reports to whom at a specific company, internal tooling. The model has seen public traces of these and will pattern-match a plausible answer in the gaps.

Shapes the prose takes

A few tells. None of them is individually conclusive — true answers sometimes have these shapes too — but they’re what to read for.

Smooth confidence on a question that should be hard. If you know the question genuinely doesn’t have a clean answer, and the model gives you a clean answer with no hedging, look again. Real expertise on hard questions sounds messier than that.

“Famously” / “well-known” / “as everyone knows” doing load-bearing work. These phrases get attached to things that aren’t actually famous when the model has nothing more concrete to say. Fluency filler, not evidence.

Excessive structural neatness. Three reasons, three examples, three counterpoints, all the same length. Real arguments are usually lopsided. Suspiciously balanced lists often mean the model padded out a shape rather than reasoning.

A specific date with no surrounding context. “In 1987, Smith showed that…” — and Smith is never mentioned anywhere else, before or after. A real expert citing a real result usually has more they could say about it.

Doubling down with new specifics when challenged. If you push back on a claim and the model produces a more elaborate justification with new specific details that weren’t in the original answer, those new details are especially worth checking. Absent a tool call, the model isn’t going off and looking anything up — it’s generating more text that fits the request, and “request” now includes “defend the previous claim.”

Self-contradiction between sessions. Ask the same question twice in different chats. If the answers disagree on specifics, at minimum the model isn’t pulling from a stable source. One could be right and the other wrong; both could be partially confabulated. The disagreement itself is the signal — which one is true is a separate check.

What confidence does and doesn’t tell you

The model’s prose tone tracks the patterns in its training data, not its actual certainty. A confident-sounding answer means the prompt looked like one with a confident-sounding answer in the training set. It doesn’t mean the model retrieved a fact and is reporting it. There’s a separate question — whether internal model states encode something like a usable confidence signal — and the research answer is “yes, partially, in the right format” (Kadavath et al. found large models reasonably calibrated on multiple-choice and true/false, with self-evaluation working better few-shot than zero-shot, and weakening on out-of-distribution tasks). Either way, the prose itself is not where that signal lives. (The longer treatment is in hallucination: the model has limited privileged access to its own uncertainty, and what little surfaces, post-training can attenuate or distort.)

This is why “are you sure?” is a poor verification tool. Models trained on human preference data are prone to sycophancy: they’ll happily insist they’re sure about something fabricated, or apologize and reverse a correct answer, depending on how the question lands. You’re not reading their certainty; you’re prompting them for a continuation in the certainty-or-doubt register.

A working procedure

In practice, the checks that actually catch things look like:

  1. Identify the load-bearing claim. Which specific assertion would, if false, make the answer wrong? It’s usually one or two sentences, not the whole paragraph. If you can’t point to one — if the answer is all soft summary — that’s its own signal.
  2. Check it against a primary source. Search for the paper, run the code, read the docs, ask a person who’d know. The point isn’t to verify everything; it’s to verify the part that matters. This is also why RAG and tool use exist as product patterns — they replace pure recall with retrieval against a real source. Note that this is grounding, not automatic verification: retrieval can fail, return stale or irrelevant text, or be misread by the model. It moves the failure mode, doesn’t erase it.
  3. Treat dressed-up specifics as suspicious until checked. If a claim is decorated with a date, a name, and a percentage but you can’t easily check any of them, the right prior is “more likely fabricated than a similar claim phrased loosely.” The veneer of specificity is not evidence.
  4. Spend more effort in the weak zones. Long-tail, recent, niche, internal, citation-shaped, code-by-name. These are where the fabrication rate is highest, so this is where verification has the highest yield.
  5. For code, run it. The fastest hallucination detector for code is the interpreter. Function-doesn’t-exist errors and import errors are the model telling on itself.

The honest limit

None of this gets you to certain. You can read a hallucinated paragraph that has no tells, written about a topic the model is in its weak zone on, and take it at face value. The only fully reliable answer to “is this a hallucination?” is to verify against ground truth — which means the question of whether to trust an LLM is, in the end, the question of when verification is cheap enough to be worth doing. Where it isn’t, the right move is often not to ask the model in the first place.

Going deeper

Honest gap: I don’t have a clean way to give you a quantitative hit-rate for any of the tells above. They’re shapes I and others have seen recurringly in current models; how often each one fires on a real hallucination versus a true answer varies by model, by topic, and over time. Treat them as priors, not detectors.