2026-05-05· 14 min read

Three SDKs, three jobs - Anthropic TS SDK, OpenAI Agents SDK, and LangGraph

Three popular agent stacks solve three different jobs. The useful question is not which SDK wins. The useful question is which job sits on the desk.

agentssdklanggraphopenaianthropicdeterminism-ladderarchitecture

Three teams can describe the same goal with the same sentence: “Build an agent for bounded evidence research.” Three wildly different systems can then appear from the same meeting.

One team needs a tidy daily worker: fetch a company page, run a prompt, write a dossier, stop. Another team needs a multi-agent surface with tool calls, handoffs, traces, and guardrails, all visible in one runtime. A third team needs a long-running workflow with retries, branches, human approvals, and replay after a failed step. Same headline. Different jobs.

The wrong move starts with the SDK name. The right move starts with the control problem.

This piece compares three common choices already showing up in the StoneyTECH corpus:

for compact single-agent loops
for structured multi-agent application assembly
for explicit graph orchestration

A fourth shape deserves mention early: the . It stays out of the contest because it solves a different category. n8n is often the right answer when the system is mostly workflow with a few agentic steps. This article stays on SDKs for code-first agent builds.

The short answer

Each stack tends to dominate one job shape:

Anthropic TypeScript SDK fits the smallest bounded agent loop.
OpenAI Agents SDK fits the fastest path to a structured agent application.
LangGraph fits workflows where topology is the architecture.

No universal winner exists. Each tool buys a different kind of determinism.

The comparison matrix

Surface	Anthropic TypeScript SDK	OpenAI Agents SDK	LangGraph
Best job	One bounded agent loop	Multi-agent app with built-in structure	Stateful workflow with explicit topology
Main purchase	Low framework gravity	Fast assembly of tools, handoffs, traces	Replayable graph control
Main fight	State, retries, and policy stay manual	Framework concepts shape the app	More code up front
Failure mode	Ad hoc orchestration creep	Hidden graph under a tidy facade	Overbuilding a small task
Reach first when	One worker can finish the job	Multiple agents or guardrails need one home	Control flow matters as much as prompt quality

Anthropic TypeScript SDK - the cleanest small loop

The Anthropic TypeScript SDK stays close to the metal. A model call goes in. Tool definitions go in. Messages come back. The team owns the loop around it.

This shape shines for a small worker with a crisp finish line. The local learning-agent proof already shows the pattern. A daily content worker picks one concept, runs one strong prompt, writes one draft, and stops. No graph runtime needs entry. No handoff tree needs management. A few files can hold the whole mental model.

Decision lever

Pick this stack when the core job is a bounded loop, not a platform.

Examples:

one research worker producing one dossier
one content worker producing one draft
one study worker producing one spaced-repetition prompt
one inbox worker triaging into a small fixed label set

In this shape, framework mass often costs more than it buys. The Messages API plus tool use already handles the core act: call model, call tool, continue, stop.

What it fights

The same simplicity becomes the first fight once the job starts growing sideways.

State management stays local. Retry policy stays local. Budget ceilings stay local. Trace shape stays local. A second agent adds custom routing logic. A human approval step adds another branch. After a few months, the codebase can drift into a home-grown framework with no formal admission.

The failure rarely starts in the prompt. The failure starts when orchestration grows but the runtime shape does not.

Failure mode

Ad hoc orchestration creep.

A team starts with one loop and ends with a graph hidden inside if statements, arrays of tool results, and a few “just for now” helper files. Debugging then turns into archaeology.

War story

The learning-agent repository works precisely because the job stays small. One worker picks the next concept, generates one .svx draft, and exits. One sibling study worker sends one recall prompt and exits. The architecture holds because each run has one bounded objective.

The lesson is not “small loops beat frameworks.” The lesson is smaller: small loops beat frameworks for small-loop jobs.

OpenAI Agents SDK - the fastest structured application

The OpenAI Agents SDK sits one level up. The framework supplies higher-level pieces for runs, tools, handoffs, guardrails, and tracing. The official guide frames the library as a way to build agentic applications where a model can use tools, hand off to specialized agents, stream partial results, and keep a full trace.

This buys speed when the job needs structure soon.

Decision lever

Pick this stack when the app needs several agent concerns at once:

tool registration
specialized agents
run traces
guardrails
shared application structure

This shape fits teams moving from one promising worker into an agent application with a visible runtime contract.

What it fights

The framework decides a lot on purpose. Agent objects, run objects, handoff flows, and trace surfaces create a coherent home for the app. The trade appears when a team wants a shape just outside the happy path. Low-level control often still exists, but the route to it runs through the framework’s model of the world first.

This is not a flaw. It is the price of fast assembly.

Failure mode

Framework-shaped thinking before workflow-shaped thinking.

A team can confuse “the framework has agents and handoffs” with “the problem needs agents and handoffs.” Then a simple worker turns into a small society of objects, each with little real work to do.

War story

An evidence-brief build often starts as one worker: search, fetch, summarize, stop. The OpenAI Agents SDK earns its keep once the brief turns into a structured process with a planner, a web researcher, a verifier, a source normalizer, and a final writer, all sharing traces and guardrails. The framework can carry such a system with less custom scaffolding than a hand-written loop.

The warning sits nearby: if the planner, researcher, verifier, and writer are really just one prompt plus two tools, the app will feel heavier than the job.

LangGraph - the graph is the product

starts from a different premise: control flow deserves first-class representation. Nodes, edges, conditional routing, cycles, persistence, and replay are the point.

This shape wins when the real problem is not “call a model with tools.” The real problem is “run a long-lived workflow without losing integrity.”

Decision lever

Pick this stack when topology matters as much as prompt quality.

Examples:

verifier panels
multi-step research flows with retries and checkpoints
human approval gates
workflows resuming after failure
systems with branching paths whose history must stay inspectable

Once the graph becomes the architecture, plain loops become too implicit.

What it fights

LangGraph asks for more code and more explicitness on day one. A team must name nodes, edges, state shape, route predicates, and persistence choices early. For a tiny worker, this can feel ceremonial.

It is ceremony. It is also the ceremony keeping concurrency and replay bugs out of folklore and inside code review.

Failure mode

Overbuilding the small task.

A two-step worker can drown in graph vocabulary before it does useful work. The graph then becomes an aspiration diagram rather than a working necessity.

War story

The Path A self-verify incident from The graph is the architecture is the clean example. The bug did not live in the generator prompt or the verifier prompt. The bug lived on an edge. A stale path remained valid in one branch and invalid in another. LangGraph-style explicit topology makes this class of bug visible. A hand-written loop often hides it until a late-night postmortem.

The lesson is sharp: when the bug can live on an edge, the graph deserves a file.

Convergence point - three SDKs, three jobs

The comparison gets easier once the job names the missing form of determinism.

If the missing determinism is “keep the worker small and obvious,” the Anthropic TypeScript SDK usually wins.
If the missing determinism is “give the app built-in agent structure fast,” the OpenAI Agents SDK usually wins.
If the missing determinism is “make routing, retries, and state transitions inspectable,” LangGraph usually wins.

This is the convergence part.

Convergence is not three SDKs becoming the same product. Convergence is three teams, under pressure, drifting toward the same architecture lesson: every useful agent system keeps pushing responsibility out of the prompt and into a more inspectable layer. One stack pushes into local code. One pushes into a framework runtime. One pushes into a graph.

The Determinism Ladder reads this drift as a placement question. Where should the next unit of responsibility live?

Deployment context changes the answer

Deployment context still comes first. A hosted tracing surface may fit one context and fail another. A team inside a restricted network may prefer a hand-written loop or a self-hosted graph runtime over any hosted control plane. A public-cloud startup can often accept faster framework adoption.

So the selection logic is not only about developer taste. It is also about placement:

Public cloud: all three stacks can fit; speed-to-assembly often matters most.
Sovereign or private cloud: framework surfaces need a clear placement story for traces, logs, and tools.
On-prem or restricted network: local control and explicit orchestration often gain value because every hidden dependency hurts more.

The SDK choice sits downstream of the deployment choice, not above it.

The decision tree

Start here:

Pick deployment context. Public cloud, sovereign cloud, private cloud, restricted network, or air-gap.
Count bounded objectives. One worker with one finish line points toward a hand-written loop. Several cooperating roles point toward a framework or graph.
Count workflow edges. Retries, approvals, checkpoints, resumability, and branch logic point toward LangGraph fast.
Count framework concerns. Handoffs, guardrails, traces, and agent boundaries point toward the OpenAI Agents SDK when the workflow still does not need explicit graph control.
Refuse premature society. If one prompt plus two tools can finish the job, stay near the Anthropic TypeScript SDK shape or an equally small loop.
Use n8n when the system is mostly workflow. Calendars, webhooks, approvals, schedules, and app integrations often belong on a workflow canvas with one agent node, not in a pure SDK contest.

Rules of thumb

Small bounded worker: start with the Anthropic TypeScript SDK shape.
Structured agent app: reach for the OpenAI Agents SDK.
Stateful workflow: reach for LangGraph.
Mostly deterministic business process: step out of the contest and use n8n or another workflow engine.
If the graph keeps appearing on the whiteboard, admit it early.
If the framework nouns outnumber the business nouns, back down a layer.

Sources

Three SDKs can look like a tool-choice debate. The deeper issue is architectural fit. Pick the job first. Then pick the control surface earning its keep.

Axioms applied in this essay

This article tested 6 of the StoneyTECH engineering axioms. Each verdict is the result of applying that axiom in this specific argument.

#1 The smallest lever wins held
The article treats each SDK as a lever choice. Lowest viable control surface wins.
#2 Push work down toward determinism held
The comparison measures where each stack moves work out of model improvisation and into code, framework, or graph.
#11 Cite or be silent held
Claims stay tied to official SDK docs, public repo behavior, and the local Drill agent proof shape.
#13 Ship with the failure mode named held
The opening move names the real failure mode: SDK selection drift caused by vibe, not by job shape.
#14 Two cheaper alternatives first held
The conclusion starts with cheaper, smaller control surfaces before broader orchestration.
#18 Pick the deployment context before the model held
Deployment context changes SDK fit. Local loop, hosted traces, and graph runtime all carry different placement implications.