2026-05-09· 8 min read

AI vs ML vs LLM vs agents — sorting out the words people keep mixing up

Four different words often collapse into one marketing pitch. A nested mental model makes the buying, building, and risk questions sharper.

demystifyaimlllmagentsprimervocabulary

On this page Overview

In a vendor pitch for an “AI-powered” workflow tool, three people in the room carried three different mental models of the product. The CTO thought it meant a chatbot. The security lead thought it meant rule-based automation with pattern matching. The PM thought it meant a fully autonomous agent making decisions on its own. The vendor let every interpretation stand because the demo could plausibly support all three.

The problem starts there. AI does too much work as a word. So does ML. So does LLM. So does agent. Marketing materials and hallway conversations use them interchangeably, but the words mean different things, and the differences matter during build, buy, and risk decisions.

The working mental model starts here.

The nested mental model

Three of the four words nest inside each other, like Russian dolls. The fourth has a different shape: a system pattern built around the smallest doll.

┌─────────────────────────────────────────────────┐
│  AI — the big category. Anything that looks     │
│       like "intelligence." Includes chess       │
│       engines and rule-based expert systems     │
│       that don't "learn" at all.                │
│                                                 │
│       ┌─────────────────────────────────────┐   │
│       │  ML — a method within AI. The       │   │
│       │       system learns from data       │   │
│       │       instead of being explicitly   │   │
│       │       programmed.                   │   │
│       │                                     │   │
│       │       ┌─────────────────────────┐   │   │
│       │       │  LLM — one kind of ML   │   │   │
│       │       │        model. Trained   │   │   │
│       │       │        on text. Predicts│   │   │
│       │       │        the next token.  │   │   │
│       │       └─────────────────────────┘   │   │
│       └─────────────────────────────────────┘   │
└─────────────────────────────────────────────────┘
                        │
                        │  uses
                        ▼
              ┌──────────────────────┐
              │  AGENT — a system    │
              │  built AROUND an LLM │
              │  with tools + a      │
              │  control loop.       │
              └──────────────────────┘

Start with the picture, then name the words.

AI — the umbrella term

The broadest category. Artificial intelligence covers systems mimicking what people call “thinking” — and historically, the category included plenty of software with no learning at all.

A chess engine using minimax search is AI. A medical-diagnosis expert system from 1985 with 4,000 hand-written rules is AI. GPS path-finding code is AI when it runs A* search over a graph. None of those systems “learn” from data — they execute human-written programs very well.

If a vendor says “AI-powered,” the word alone says almost nothing. It could mean any of the above. The failure mode of conflating “AI” with “modern AI” is paying a premium for if/else rules with nicer chrome.

ML — The Method Learning From Data

Machine learning is the subset of AI where examples train the system instead of explicit task programming. The canonical version: show a model 10,000 photos labeled “cat” or “not cat,” and it learns to predict labels for new photos.

ML predates LLMs by decades. Spam filters are ML. Credit-card fraud detection is ML. Netflix’s recommendation engine is ML. Phone face unlock is ML. None of these are LLMs and none generate text.

The failure mode of conflating “ML” with “LLM” is assuming any ML model can answer questions in English. Most cannot; they classify, predict numbers, cluster, or recommend. Asking a fraud-detection model for a sentence-level rationale asks it to perform a job outside its design.

LLM — one kind of ML model

A large language model is one specific kind of ML model. Training uses trillion-sentence-scale text and a transformer architecture. The job stays narrow: given some tokens, predict the next token. Run the loop a few hundred times and a sentence appears. Run it longer and an essay appears.

The previous piece gives the working mental model for LLMs: a database for word queries, except matching stays loose because the database stores patterns for generating text rather than facts. Claude, GPT, Gemini, and Llama are all LLMs.

The failure mode of conflating “LLM” with “AI” is assuming LLM strengths apply to all AI, or vice versa. LLMs can write a draft email. They cannot reliably do arithmetic past a handful of digits, access real-time information without help, or directly take actions in the world. Other AI tools handle those jobs better.

Agent — a system built around an LLM

This one has a different shape from the rest.

An agent is not a model. It is a system pattern. Start with an LLM. Give it a list of tools it can call: search the web, query a database, send an email, run a script. Wrap it in a loop: the LLM picks a tool, the tool runs, the result feeds back into the LLM, and the LLM picks the next step. Run the loop until goal completion or budget exhaustion.

The whole structure — LLM at the center, tools around it, control loop around the whole thing — forms an agentic system. The LLM is the brain; the agent is the brain plus the body plus the workflow.

This matters because agents do things. An LLM by itself just generates text. An agent can read an inbox, draft replies, schedule meetings, push code to a repo, and post to Slack. The autonomy is real, and so are the failure modes.

The failure mode of conflating “LLM” with “agent” is treating a chatbot like an agent (it cannot take action) or treating an agent like a chatbot (it can take unapproved action). The first disappoints. The second creates most “the agent did what?” stories. The deeper version appears in the threat-surface essay: excessive agency is its own named risk class.

Why the distinction matters in practice

When the correct category has a name, four tasks get easier:

1. Vendor questions get sharper. “Is this a deterministic rule engine, an ML classifier, an LLM-backed assistant, or an action-taking agent?” Four very different cost profiles, review processes, and security reviews.

2. Failure modes become easier to size. A rule engine fails in predictable ways and stays easy to debug. An ML classifier fails when the input distribution drifts and becomes hard to debug. An LLM hallucinates and can create embarrassment. An agent can take unauthorized real-world action, a different category of bad.

3. Engineering investment gets easier to right-size. A small ML classifier can sometimes solve a problem people route to an LLM. An LLM in a chat box operates with far less machinery than an agent loop. Knowing runtime, monitoring, and recovery cost helps prevent over-building.

4. News gets better calibration. When a headline says “AI now does X,” identify which of the four words actually applies. “AI now plays Go better than humans” described a particular ML system trained for one task. It cannot write a haiku. “AI now writes code” points to an LLM doing pattern completion. “AI now schedules meetings” points to an agent. These are not interchangeable claims.

Four takeaways

AI is the umbrella. Use it carefully — it’s vague enough to cover almost anything.
ML is the learning subset. Most ML in production has nothing to do with LLMs.
LLMs are one kind of ML model. Trained on text, predict tokens, behave like a loose database. Generate things; don’t act on them.
Agents are systems, not models. They wrap an LLM with tools and a loop. They can take action — both feature and risk.

Where to read more

One rigorous reference: Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach — the field’s standard textbook. The first chapter alone gives the cleanest treatment of “what counts as AI.”

For the working pattern behind agents: the LangChain agents documentation — short, code-forward, and clear about the loop.

Four words, four meanings, one nested mental model. The next time “AI” stands in for a more specific unnamed system, ask which category applies.

Next in the Demystify AI series: tokens, context windows, attention — model mechanics without math.

Axioms touched

Lighter touch than the Learn series — primer pieces don't usually lean heavily on the axiom catalog, but where they do it's noted.

#11 Cite or be silent held
Cite or be silent — Russell & Norvig + LangChain agent docs are the two grounding citations.
#13 Ship with the failure mode named held
Each section names the failure mode from confusing the term with its neighbor.