How can I tell when an LLM is making the answer up?
True answers and fabricated ones come out of the same pipe, in the same tone. There's no red light. But there are seams — places hallucinations cluster, shapes they tend to take, tells you can learn to read.
Why it exists
You’re writing an essay. You ask ChatGPT for a citation, it gives you
a paper title, two authors, a journal, a year. You paste it into your
draft. Hours later, you go to actually open the paper — and it doesn’t
exist. The authors are real, the journal is real, the year is plausible.
The paper is not. Or you’re vibe-coding and the model writes
import pandas as pd; df.read_excel_safe(path) and your interpreter
tells you read_excel_safe is not a thing. It looks exactly like a
real pandas method. It isn’t one.
That experience — the answer that looked right, sounded right, and turned out to be confidently invented — is the thing this post is about. The hard part is that every answer the model gives looks like an answer. True ones and fabricated ones come out of the same pipe, in the same tone, with the same conviction. There is no little red light that flips on when the model is making it up.
That’s the whole problem. The system that wrote the hallucination — fluently emitting a plausible-looking continuation when it didn’t actually know — is the same system that wrote the answer that turned out to be right. From the surface, they’re indistinguishable.
So the practical question, the one a curious user actually has, isn’t “why do hallucinations exist?” — it’s “given that they do, how do I tell, in the moment, whether the thing in front of me is one?”
The honest answer is: you often can’t, not from the text alone. But there are seams. There are places hallucinations cluster, shapes they tend to take, and tells that come from how the model produces them. None of these are decisive. Together, they’re enough to know when to stop trusting the prose and go check.
Why it matters now
LLMs are no longer used only as toy chatbots — they write code, draft contracts, summarize medical records, propose investment theses. The cost of acting on a confident-but-wrong answer scales with what you’re doing. A fabricated citation in a school essay is embarrassing. A fabricated function name in a deployed agent is a bug. A fabricated drug interaction in a clinical decision support tool is harm.
This is also why “trust but verify” is too generous a frame. The model is not actually trying to deceive you, but the output is fluent enough that trust is the default state without any conscious decision. The job isn’t “decide whether to trust this.” The job is “notice the specific shapes that should make you stop trusting it, and check the ones that matter.”
The short answer
spotting a hallucination ≈ find the checkable claim + check it + read the rest for tells
The checkable claims — names, numbers, citations, code identifiers, dates, quotes — are where fabrication has to land somewhere falsifiable, so they’re where you catch it. The rest of the prose isn’t necessarily true or false in a checkable way; for that, you read for tells.
How it works
Hallucinations cluster in predictable places, take predictable shapes, and ride on a predictable kind of fluency. Here’s where to look.
Where they cluster
Long-tail specifics. The model has seen a lot of text on the most
famous topics and very little on the next million. Common,
well-documented things — how TCP works, what git rebase does, how
photosynthesis works — are lower-risk, though not safe. The further
down the long tail you go (the third author on a 2014 paper, an
obscure library’s API, the population of a small town in 1973), the
higher your prior for fabrication should be.
Anything past the training cutoff, asked closed-book. Versions, releases, current events, prices, who runs what company today. If the model has no tool/search/retrieval available, the prompt still looks like a question it should be able to answer, and a confident guess is a common failure mode. (When the product does have search or RAG attached, this risk is partly absorbed there — see below.)
Citations and quotations. This is the canonical hallucination shape: a paper title that sounds right, plausible authors, a journal that exists, a year that fits, and the contents wrong. Variants include real authors with the wrong title, real titles with the wrong year, real papers cited for claims they don’t make. Direct quotes attributed to specific people are similar. The model has learned the shape of a citation extremely well; the shape doesn’t constrain the contents.
Code that calls things by name. Function names, library imports, command-line flags, API endpoints. The model knows what kind of name should appear in this position and produces something that fits the pattern. Whether that name actually exists in the library is a different question.
Numbers with too many digits. “The 2019 study showed a 37.4% improvement” should make you more suspicious than “the study showed roughly a third improvement.” Specificity is cheap to fabricate and feels authoritative.
Internal, organizational, or proprietary detail. Undocumented APIs, recent policy changes, who reports to whom at a specific company, internal tooling. The model has seen public traces of these and will pattern-match a plausible answer in the gaps.
Shapes the prose takes
A few tells. None of them is individually conclusive — true answers sometimes have these shapes too — but they’re what to read for.
Smooth confidence on a question that should be hard. If you know the question genuinely doesn’t have a clean answer, and the model gives you a clean answer with no hedging, look again. Real expertise on hard questions sounds messier than that.
“Famously” / “well-known” / “as everyone knows” doing load-bearing work. These phrases get attached to things that aren’t actually famous when the model has nothing more concrete to say. Fluency filler, not evidence.
Excessive structural neatness. Three reasons, three examples, three counterpoints, all the same length. Real arguments are usually lopsided. Suspiciously balanced lists often mean the model padded out a shape rather than reasoning.
A specific date with no surrounding context. “In 1987, Smith showed that…” — and Smith is never mentioned anywhere else, before or after. A real expert citing a real result usually has more they could say about it.
Doubling down with new specifics when challenged. If you push back on a claim and the model produces a more elaborate justification with new specific details that weren’t in the original answer, those new details are especially worth checking. Absent a tool call, the model isn’t going off and looking anything up — it’s generating more text that fits the request, and “request” now includes “defend the previous claim.”
Self-contradiction between sessions. Ask the same question twice in different chats. If the answers disagree on specifics, at minimum the model isn’t pulling from a stable source. One could be right and the other wrong; both could be partially confabulated. The disagreement itself is the signal — which one is true is a separate check.
What confidence does and doesn’t tell you
The model’s prose tone tracks the patterns in its training data, not its actual certainty. A confident-sounding answer means the prompt looked like one with a confident-sounding answer in the training set. It doesn’t mean the model retrieved a fact and is reporting it. There’s a separate question — whether internal model states encode something like a usable confidence signal — and the research answer is “yes, partially, in the right format” (Kadavath et al. found large models reasonably calibrated on multiple-choice and true/false, with self-evaluation working better few-shot than zero-shot, and weakening on out-of-distribution tasks). Either way, the prose itself is not where that signal lives. (The longer treatment is in hallucination: the model has limited privileged access to its own uncertainty, and what little surfaces, post-training can attenuate or distort.)
This is why “are you sure?” is a poor verification tool. Models trained on human preference data are prone to sycophancy: they’ll happily insist they’re sure about something fabricated, or apologize and reverse a correct answer, depending on how the question lands. You’re not reading their certainty; you’re prompting them for a continuation in the certainty-or-doubt register.
A working procedure
In practice, the checks that actually catch things look like:
- Identify the load-bearing claim. Which specific assertion would, if false, make the answer wrong? It’s usually one or two sentences, not the whole paragraph. If you can’t point to one — if the answer is all soft summary — that’s its own signal.
- Check it against a primary source. Search for the paper, run the code, read the docs, ask a person who’d know. The point isn’t to verify everything; it’s to verify the part that matters. This is also why RAG and tool use exist as product patterns — they replace pure recall with retrieval against a real source. Note that this is grounding, not automatic verification: retrieval can fail, return stale or irrelevant text, or be misread by the model. It moves the failure mode, doesn’t erase it.
- Treat dressed-up specifics as suspicious until checked. If a claim is decorated with a date, a name, and a percentage but you can’t easily check any of them, the right prior is “more likely fabricated than a similar claim phrased loosely.” The veneer of specificity is not evidence.
- Spend more effort in the weak zones. Long-tail, recent, niche, internal, citation-shaped, code-by-name. These are where the fabrication rate is highest, so this is where verification has the highest yield.
- For code, run it. The fastest hallucination detector for code is the interpreter. Function-doesn’t-exist errors and import errors are the model telling on itself.
The honest limit
None of this gets you to certain. You can read a hallucinated paragraph that has no tells, written about a topic the model is in its weak zone on, and take it at face value. The only fully reliable answer to “is this a hallucination?” is to verify against ground truth — which means the question of whether to trust an LLM is, in the end, the question of when verification is cheap enough to be worth doing. Where it isn’t, the right move is often not to ask the model in the first place.
Famous related terms
- Hallucination —
hallucination = next-token model + no built-in "I don't know" + a prompt the model can't answer. The mechanism this post is the user-facing dual of. - Calibration —
calibration ≈ stated confidence matches actual accuracy. In free-form chat prose, current LLMs aren’t reliably calibrated; they can be closer to calibrated in constrained formats like multiple-choice. Either way, the prose tone isn’t where you read their certainty. - Sycophancy —
sycophancy = model preferring user-pleasing answer over correct one. A measured failure mode of preference-trained models; one reason asking “are you sure?” is unreliable as a check (the others being poor free-form calibration and prompt sensitivity). - RAG —
RAG = retrieve relevant text + prompt the model with it. The product-level answer to “verify before generating.” - Tool use / function calling — letting the model look things up instead of recalling them. Replaces a hallucination opportunity with a search query.
- Confabulation — borrowed from neurology; some researchers prefer it because it captures the “fluent fabrication with no awareness of fabricating” flavor more accurately than “hallucination.”
- Grounding — umbrella term for tying an answer to a real source. The structural fix; the tells in this post are the survival skill until grounding is in place.
Going deeper
- The “limitations” sections of major model cards (OpenAI, Anthropic, Google). Vendors document the failure modes in more honest detail than the marketing.
- Kadavath et al., Language Models (Mostly) Know What They Know — an empirical look at whether models can predict their own correctness. The qualified “yes” is interesting; so are the failure cases.
- Try this yourself: pick a topic you genuinely know well, ask a frontier model a hard question in that topic, and read the answer with an expert’s eye. The first time you watch a smooth, confident paragraph fall apart on close inspection, the rest of this post stops being abstract.
Honest gap: I don’t have a clean way to give you a quantitative hit-rate for any of the tells above. They’re shapes I and others have seen recurringly in current models; how often each one fires on a real hallucination versus a true answer varies by model, by topic, and over time. Treat them as priors, not detectors.