Demystify AI

2026-05-02· 8 min read

LLMs work like word-query databases, but looser

A practical mental model for LLMs: word-based queries over learned patterns, refined with the looseness behind iteration, useful surprises, and confident wrongness.

demystifyllmprimermental-models

LLMs make more sense with one almost-correct model: a database for word queries. Small refinement: the query returns approximate generated text, not an exact stored row.

Technical generalists already use AI tools for ticket triage, drafts, code, analysis, and research. Many still lack a working model for the mechanism. This model gives enough structure for better prompts, better review, and better expectations.

The Metaphor

An LLM behaves a lot like a database for word queries.

A question goes in; an answer comes back. A coding problem goes in; a code snippet comes back. A draft email goes in with a request for a more polite tone; a more polite version comes back. The interaction feels like search or SQL, but in plain English instead of SELECT * FROM. Words go in. Words come out. A huge learned store of written patterns sits behind the interaction, and input pulls a relevant continuation forward.

This mental model covers most day-to-day use. Hold the shape, then add one refinement.

The Small Refinement

The answer runs looser than exact lookup.

The “database” metaphor does not mean stored-row lookup in a table. The model generates a set of words close to, and probably responsive to, the prompt. Approximate match replaces exact match. The looseness does the useful work.

For the prompt “what’s the capital of France,” the model does not open a cities table and read back Paris. It generates the words most likely to come next. Those words happen to form “The capital of France is Paris.” The output can come out correct because learned patterns strongly point in the same direction, but the mechanism uses generation, not lookup.

Same shape. Different mechanism. The refinement ends there.

Looseness Creates Value

Most explainers miss the useful part: looseness creates the value.

A real database demands exact input. Wrong column name, zero rows. Misspelled value, zero rows. Real databases stay exact and unforgiving.

Human work often starts rough. Vague goal, fuzzy terms, unclear answer shape. Fishing, not filing.

An LLM casts a wide net, so an answer can surface even when phrasing lacks precision. A prompt like “the thing where DNS needs refresh after changing a record” can map to TTL expiration, DNS cache flushes, or a dscacheutil-style local cache invalidation. A prompt like “framework for scheduling agentic tasks in Python, starts with L maybe” can surface LangGraph, LangChain, or Langroid for comparison. Loose query, useful answer.

The corollary: more specific questions usually produce more specific answers. Wider net, looser catch.

The workflow follows from this:

  • Ask an imperfect question.
  • Read the answer. It may fit, or it may surprise in a useful way.
  • If the answer surprises, use the surprise as new search space. Ask again with sharper terms.
  • Repeat until the answer fits the patterns under investigation.

Iterative fishing is the workflow. A rough first prompt does not mean failure. The design expects refinement.

How Looseness Works

The previous section provides a working mental model. The next layer adds mechanism.

The model does not store answers. It acts as a giant function. A sequence of tokens goes in (chunks of text, about three-quarters of a word on average), and the function returns a probability distribution over possible next tokens. A sampler picks from the distribution. The loop repeats a few hundred times until a full answer emerges.

A few specifics matter:

  • Tokens, not words. The model operates on pieces a bit smaller than words. “Architecture” might use one token; “underwhelmingly” might use three. Rare words and unusual capitalization can create odd token sequences, so the model can fumble them.
  • The context window. The function sees only a finite span back. Older models handled a few thousand tokens; newer ones handle millions. Once content falls outside the window, the model no longer sees it.
  • Attention. When predicting the next token, the model weighs earlier tokens by relevance. Not all prompt words count equally. Specific anchor terms, such as a function name, product name, or year, can strongly shape the response.
  • Temperature. A knob from 0 upward controls how much randomness enters each next-token choice. At 0, the model picks the highest-probability token every time and becomes more predictable but more boring. Higher temperatures sample more freely, with more creativity and more inconsistency.

Conceptually, the machine takes tokens, returns probabilities over next tokens, and repeats the loop until it produces an answer. The “database” feel exists because training patterns came from trillion-sentence-scale text. The system is a generator wearing a database costume, not the other way around.

The Downside Of Looseness

The same mechanism enabling rough-query discovery also produces confident wrongness.

Prompt: “what’s the capital of Atlantis?” Atlantis is a mythical city. It has no capital and never had one. A real database would return zero rows. An LLM has no zero-row mode. It generates plausible-shaped text. So an answer can look like “The capital of Atlantis was Poseidon’s seat of governance, located in the central districts of the island.” The same confident tone carries the answer.

This is hallucination. Newer versions reduce many cases, but the core generator still lacks a native concept for this question has no valid answer. It only estimates given these tokens, which tokens likely come next. Plausible-shaped text always exists; the model can produce some.

Practical implication: evaluate every model answer like input from a smart colleague who does not always know the limits of personal knowledge. Sometimes correct. Sometimes confidently wrong. Always fluent. Treat it accordingly.

Four Practical Takeaways

Key takeaways:

  1. Iteration is the workflow. The first answer rarely serves as the final answer. Ask again, sharper, using newly surfaced terms.
  2. Specificity in the question correlates with specificity in the answer. “Explain Python” gets a vague paragraph. “Compare dict.get('key') and dict['key'] access in Python with one example each, and explain when each form fits” gets a much tighter answer.
  3. The same question can produce different answers. Sampling causes this: same input, different draws from the probability distribution. Annoying for consistency, but not necessarily wrong.
  4. Watch for confident wrongness. When a model answer could fail, verify the code snippet, fact, date, or API signature. Fluency does not correlate with correctness.

Where To Read More

One short next-level read: Stephen Wolfram, What Is ChatGPT Doing… and Why Does It Work? explains the mechanism in more depth without going full math.

Visual explanation: Jay Alammar, The Illustrated Transformer draws the architecture so attention becomes visible.

The metaphor, looseness refinement, and two links provide a grounded LLM model for most daily work with these tools. Apply it during iteration and verification.


Next in the Demystify AI series: AI vs ML vs LLM vs agents — what each word actually means, in the order technical readers care about them.

Axioms touched

Lighter touch than the Learn series — primer pieces don't usually lean heavily on the axiom catalog, but where they do it's noted.

  • #11 Cite or be silent held

    Cite or be silent — Wolfram + Alammar are the two grounding citations; no claim goes beyond the mechanism.