Demystify AI

2026-05-03T00:00:00.000Z· 9 min read

Why LLMs hallucinate — same mechanism as the looseness, different consequence

Hallucination comes from the same retrieval looseness behind useful LLM answers, with a different consequence.

demystifyprimerhallucinationreliabilitymental-models

The Brief Made It To Filing

At a mid-size firm, the legal team almost lost a partner over an AI citation failure. An associate drafted a motion, used the in-house chatbot to “find supporting case law,” and got back three citations with case names, court, year, and a one-sentence summary of each. They looked right. They read right. They went into the brief.

Two of the three cases do not exist. The third exists but says the opposite of what the summary claims. The opposing counsel’s paralegal catches it in twenty minutes.

The question spread across partners, associates, and the IT director receiving the angry phone call: how does a tool this good produce something this bad without flinching? No error message. No hedge. No uncertainty. Just three fluent, plausible, completely fabricated citations beside two real ones.

Citations are a shape the model knows cold

The earlier piece on LLMs as a loose database provides most of the picture. The model is not looking anything up. It generates each next word from learned patterns. Citations are a pattern. “Smith v. Jones, 412 F.3d 891 (9th Cir. 2005)” is a shape — plaintiff v. defendant, volume number, reporter, court, year. The model has seen tens of thousands of these. It knows the shape cold.

So a case-law prompt triggers the usual model behavior: text fitting the answer shape. Plausible plaintiff. Plausible volume number. Plausible court for the jurisdiction. Plausible year. Plausible one-line summary in the register of legal headnotes.

Your prompt:  "cases on negligent retention in California"
                          │
                          ▼
            ┌─────────────────────────────┐
            │  Pattern: legal citation    │
            │  Pattern: CA jurisdiction   │
            │  Pattern: negligence verb   │
            └─────────────┬───────────────┘
                          ▼
              "Garcia v. Pacific Logistics,
               203 Cal.App.4th 411 (2012)"
                          │
              shape: ✓   existence: ?

The shape is correct. Whether the case actually exists is a separate question the model never asked.

There is no fact-check step — only one mechanism

The refinement: the model has no separate “fact-check” step. One mechanism generates one token at a time, and the mechanism does not distinguish recalling from confabulating. From inside the generator, both are just “the next plausible token.”

When training data contained the real Smith v. Jones case thousands of times, the model’s pattern-pull leans hard toward the real volume number and year. When training support gets thin — niche jurisdiction, obscure topic, rarely cited material — the pattern-pull weakens, but the shape generator still runs. It fills in a volume number with the right shape. It fills in a plausible year. The output looks identical either way.

This is the part most people miss. Hallucination isn’t the model “making things up” as a separate behavior. It’s the model doing exactly what it always does, in a region of the space where the training data was thin.

Why the imperfection is the feature

The key point: the same mechanism makes the model useful and makes it hallucinate.

Looseness lets the model rephrase messy prompts, summarize new documents, and generalize from “how to write a Python decorator” to “how to write a TypeScript decorator.” A strict “only emit tokens verifiable against ground truth” guardrail would not create a more honest assistant. It would create a much worse one: no paraphrase, no generalization, no help with novel work.

Hallucination and helpfulness come out of the same pipe. Tuning a knob cannot keep one and delete the other. “Just make it stop hallucinating” is not a roadmap item; it is a category error.

No ‘abstain’ token in the vocabulary

One mechanical detail matters. During generation, each step samples from a probability distribution over possible next tokens. In a region with strong training support, the distribution forms a sharp peak: one or two tokens outrank the rest. In a region with weak training support, the distribution goes flat: many tokens look roughly equally likely, and the model picks one anyway because the vocabulary has no “abstain” token.

The model has no internal signal saying “flat-distribution region, low trust.” It just emits the token. Modern systems try to estimate this externally — confidence scoring, retrieval-augmented generation, tool use for grounding specific claims — but none of those features mean the model itself knows it is guessing. They are scaffolding around it.

This is why fluency is such a poor signal for accuracy. A confident, well-formed sentence costs the model the same as a hesitant one. There is no internal cringe.

How it goes wrong, and how to spot it

Two failure modes show up over and over.

The plausible-shape fabrication. Citations, API method names, RFC numbers, library functions, statistics, historical dates. Anything with a recognizable structure where the shape has strong training support but the specific instance may lack it. Spot it by treating every precise identifier — number, name, URL, citation — as a hypothesis until a system of record confirms it.

The confidently wrong synthesis. The model takes two real things and connects them in a plausible but false way. “Drug X conflicts with drug Y”: both drugs real, contraindication invented. Spot it by treating cross-fact joins as the weak point, not the endpoints.

What to do about it

  1. Treat any specific identifier as a hypothesis. Names, numbers, URLs, citations, version strings, function signatures — verify before release.
  2. Fluency is not a confidence signal. The model sounds equally sure when recalling and when confabulating. Read style as style only.
  3. Topic obscurity raises hallucination rate. If the answer requires niche knowledge, assume thinner training support and verify harder.
  4. Cross-claim joins create the weakest point. When the model reasons across two facts, the connection carries more invention risk than the facts themselves.
  5. Do not ask the model for certainty. It will produce the shape of a confidence answer, with no more grounding than the original. Verify externally, against a source of truth.

Worth reading next

  • Lin, Hilton, Evans — “TruthfulQA: Measuring How Models Mimic Human Falsehoods” (ACL 2022). The rigorous reference: a benchmark designed specifically around the failure mode this piece describes, with the data to show why scaling alone doesn’t fix it.
  • Simon Willison — “Hallucinations in code are the least dangerous form” (blog, March 2025). The accessible explainer: a working developer’s framing of why some hallucination domains have natural verifiers and others don’t, with practical implications for where to deploy LLMs.

Next in the Demystify AI series: temperature, sampling, and why the same prompt gives different answers — pulling apart the dial almost nobody understands.

Axioms touched

Lighter touch than the Learn series — primer pieces don't usually lean heavily on the axiom catalog, but where they do it's noted.