- Articles
- /
- When to Build Your Own AI Agent (and When Claude Code Is Enough)
When to Build Your Own AI Agent (and When Claude Code Is Enough)
When should you build your own AI agent instead of using Claude Code? A practical guide to the Anthropic Agent SDK, with mental models for understanding when and why to go beyond interactive AI.
The Moment I Got Confused#
I was using Claude Code in my terminal, and it was working beautifully. I asked it to refactor a function. It read the file, understood the context, made the changes, ran the tests. Done.
Then someone mentioned the "Claude Agent SDK" and I thought: wait, isn't that what I'm already using?
The confusion is natural. Both involve Claude. Both can use tools. Both can perform multi-step tasks. So what's the difference? And more importantly: when would I actually need to build my own agent?
This article is the answer I wish I'd found.
What Is an Agent, Anyway?#
Strip away the hype, and an AI agent is surprisingly simple:
Agent = LLM + Tools + Loop
That's it. The LLM decides what to do. Tools let it take actions (read files, call APIs, run code). The loop keeps it going until the task is done.
┌─────────────────────────────────────┐
│ Agent Loop │
│ ┌─────────┐ ┌─────────┐ │
│ │ LLM │───▶│ Tools │ │
│ │ (think) │◀───│ (act) │ │
│ └─────────┘ └─────────┘ │
│ │ │ │
│ └──── loop ────┘ │
└─────────────────────────────────────┘
What makes this useful is the agent harness—the infrastructure that wraps around this loop. Think of it like an operating system for agents:
| Component | Computer Analogy | Agent World |
|---|---|---|
| Model | CPU | Raw reasoning capability |
| Context window | RAM | Limited working memory |
| Agent harness | Operating System | Manages tools, context, sessions |
| Your agent | Application | Your specific logic and prompts |
The harness handles the boring but critical stuff: initializing the model, managing tool calls, maintaining context across turns, handling errors, enforcing limits. You focus on what your agent should do.
Claude Code CLI is one harness. The Agent SDK lets you build your own. Pydantic AI is another approach entirely.
The Hidden Constraint: Context#
There's one thing the simple "LLM + Tools + Loop" formula hides: context windows are finite.
Every iteration of the loop adds to the context:
- LLM decides to call a tool
- Tool executes, returns result
- Result gets ingested back into context
- LLM reasons about the result, decides next action
- Repeat
After 20, 50, 100 tool calls, the context gets cluttered. File contents, API responses, intermediate reasoning—it all piles up. Eventually you hit the limit.
This is why context engineering matters. Good agent harnesses handle this automatically:
- Compactification: Summarizing older parts of the conversation while preserving what matters
- Selective context: Only including relevant tool results, not everything
- Checkpointing: Saving state externally so context can be reconstructed
Claude Code does compactification automatically—when context grows too large, it summarizes the conversation to stay within limits. The summary itself is generated by an LLM. It's LLMs all the way down.
For a deeper dive on harness architecture, Phil Schmid's Agent Harness article is worth reading—especially the insight that competitive advantage comes from the failure trajectories your harness captures, not from prompts alone.
The Core Mental Model: Who's Driving?#
The difference isn't about capability—it's about orchestration.
The Orchestration Spectrum
[Human Orchestrator] ←→ [Code Orchestrator]
You type a prompt Your code triggers Claude
You see the result Your code handles the result
You decide what's next Your code decides what's next
Claude Code CLI Agent SDK
Interactive sessions Automated workflows
Development, exploration Production services
Claude Code CLI: You're in the driver's seat. You ask questions, Claude responds, you steer the conversation. The loop is:
You → Claude → You → Claude → You → ...
Agent SDK: Your software is in the driver's seat. Your code invokes Claude, handles responses programmatically, and decides the next steps. The loop becomes:
Event → Your Code → Claude → Your Code → Claude → Your Code → Output
The human might not be in the loop at all—or only at specific checkpoints you define.
When You Need the Agent SDK#
If Claude Code works great for you (it does for me), why would you ever build your own agent?
Build your own when:
-
The trigger isn't a human typing — A webhook fires, a cron job runs, a file appears in S3. Your code needs to invoke Claude without anyone sitting at a terminal.
-
The user isn't you — You're building a product. Your customers interact through your UI, not a terminal. You need to embed Claude's capabilities into your application.
-
The workflow is repeatable — You find yourself doing the same multi-step process over and over. Codify it once, run it automatically.
-
You need custom tools — Claude Code has great built-in tools, but your domain needs something specific: query your database, call your internal APIs, interact with your proprietary systems.
-
You want programmatic control — Cost limits, model selection, custom approval flows, structured outputs. You need fine-grained control that interactive sessions don't provide.
Stick with Claude Code when:
- You're exploring, debugging, or doing ad-hoc work
- You want to steer each step based on what you see
- The task is unique and won't repeat
- You are the user
What the Agent SDK Actually Gives You#
The SDK isn't a different Claude—it's Claude Code's capabilities, made programmable.
The Agentic Loop (Built-In)#
With the raw Claude API, you implement the tool loop yourself:
# Raw API: You handle the loop
response = client.messages.create(...)
while response.stop_reason == "tool_use":
result = execute_tool(response.tool_use) # You write this
response = client.messages.create(tool_result=result, ...)
With the Agent SDK, Claude handles it:
# Agent SDK: Claude handles tools autonomously
from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions
options = ClaudeAgentOptions(allowed_tools=["Read", "Write", "Bash"])
async with ClaudeSDKClient(options=options) as client:
await client.query("Fix the bug in auth.py")
async for message in client.receive_response():
print(message) # Claude reads files, edits code, runs tests—automatically
// TypeScript uses the query() function directly
import { query } from "@anthropic-ai/claude-agent-sdk";
for await (const message of query({
prompt: "Fix the bug in auth.py",
options: { allowedTools: ["Read", "Write", "Bash"] },
})) {
console.log(message);
}
That's the core value: autonomous multi-step execution without you managing the loop.
Custom Tools#
Define domain-specific capabilities with a simple decorator:
from claude_agent_sdk import (
tool, create_sdk_mcp_server, ClaudeAgentOptions,
ClaudeSDKClient, AssistantMessage, TextBlock
)
@tool("get_user", "Fetch user details from the database", {"user_id": str})
async def get_user(args):
user = await db.fetch_user(args["user_id"]) # Your database
return {
"content": [{"type": "text", "text": f"User: {user.name}, Plan: {user.plan}"}]
}
@tool("send_notification", "Send a notification to a user", {"user_id": str, "message": str})
async def send_notification(args):
await notifications.send(args["user_id"], args["message"]) # Your service
return {
"content": [{"type": "text", "text": "Notification sent"}]
}
# Create an MCP server with your tools
server = create_sdk_mcp_server(
name="my-tools",
version="1.0.0",
tools=[get_user, send_notification]
)
# Run the agent with your tools available
options = ClaudeAgentOptions(
mcp_servers={"app": server},
allowed_tools=["mcp__app__get_user", "mcp__app__send_notification"]
)
async with ClaudeSDKClient(options=options) as client:
await client.query("Check if user 123 is on the free plan and remind them about the premium trial")
async for message in client.receive_response():
if isinstance(message, AssistantMessage):
for block in message.content:
if isinstance(block, TextBlock):
print(block.text)
Claude decides when to call your tools, in what order, and how to combine results. You just define what's possible.
Subagents#
Delegate specialized tasks to focused agents:
from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions, AgentDefinition
options = ClaudeAgentOptions(
allowed_tools=["Read", "Grep", "Glob", "Task"], # Task enables subagents
agents={
"code-reviewer": AgentDefinition(
description="Expert code reviewer. Use for security and quality checks.",
prompt="""You are a code review specialist. Focus on:
- Security vulnerabilities
- Performance issues
- Code quality and maintainability""",
tools=["Read", "Grep", "Glob"], # Read-only access
model="sonnet"
),
"test-runner": AgentDefinition(
description="Runs tests and analyzes results.",
prompt="Run tests and provide clear analysis of failures.",
tools=["Bash", "Read", "Grep"] # Can execute commands
)
}
)
async with ClaudeSDKClient(options=options) as client:
await client.query("Review the auth module for security issues, then run the test suite")
async for message in client.receive_response():
print(message)
// TypeScript equivalent
import { query } from "@anthropic-ai/claude-agent-sdk";
for await (const message of query({
prompt: "Review the auth module for security issues, then run the test suite",
options: {
allowedTools: ["Read", "Grep", "Glob", "Task"],
agents: {
"code-reviewer": {
description:
"Expert code reviewer. Use for security and quality checks.",
prompt: `You are a code review specialist...`,
tools: ["Read", "Grep", "Glob"],
model: "sonnet",
},
"test-runner": {
description: "Runs tests and analyzes results.",
prompt: "Run tests and provide clear analysis of failures.",
tools: ["Bash", "Read", "Grep"],
},
},
},
})) {
console.log(message);
}
The main agent orchestrates, delegating to specialists as needed.
Programmatic Controls#
Fine-grained configuration for production use:
// TypeScript - uses camelCase
const options = {
// Cost control
maxTurns: 10, // Limit agent iterations
maxBudgetUsd: 5.0, // Stop after spending $5
// Model selection
model: "claude-sonnet-4-5",
fallbackModel: "claude-haiku-4", // Cheaper fallback
// Permissions
permissionMode: "acceptEdits", // Auto-accept file changes
allowedTools: ["Read", "Write"], // Whitelist tools
disallowedTools: ["Bash"], // Blacklist dangerous ones
// Context
systemPrompt: "You are a Python specialist. Follow PEP 8.",
cwd: "/path/to/project",
// Structured output (for parsing)
outputFormat: {
type: "json_schema",
schema: {
type: "object",
properties: {
/* ... */
},
},
},
};
# Python - uses snake_case
options = ClaudeAgentOptions(
max_turns=10,
max_budget_usd=5.0,
model="claude-sonnet-4-5",
fallback_model="claude-haiku-4",
permission_mode="acceptEdits",
allowed_tools=["Read", "Write"],
disallowed_tools=["Bash"],
system_prompt="You are a Python specialist. Follow PEP 8.",
cwd="/path/to/project",
output_format={
"type": "json_schema",
"schema": {"type": "object", "properties": {}}
}
)
The Framework Landscape#
The Agent SDK isn't the only option. Here's how it compares:
| Framework | Provider | Models | Philosophy |
|---|---|---|---|
| Claude Agent SDK | Anthropic | Claude only | Native, minimal abstraction |
| Pydantic AI | Third-party | Many LLMs | Type-safe, dependency injection |
| LangChain | Third-party | Many LLMs | Heavy abstraction, flexible |
Claude Agent SDK#
Use when: You're committed to Claude and want native access to its features.
Strengths:
- First-party: features ship as Claude ships them
- No abstraction overhead—direct access to everything
- Built-in tools from Claude Code (file ops, bash, web search)
- Streaming, sessions, checkpointing out of the box
Tradeoffs:
- Claude-only. If you need GPT-4 or Gemini, you can't swap.
- Smaller ecosystem than general-purpose frameworks.
Pydantic AI#
Use when: You want type safety, dependency injection, and model flexibility.
Strengths:
- Excellent software engineering practices
- Swap models easily (Claude today, GPT-4 tomorrow)
- Clean, Pythonic API with Pydantic validation
- Good for teams with strong typing culture
Tradeoffs:
- You implement tools from scratch (no built-in file/bash tools)
- Third-party, so Claude-specific features may lag
Deep Dive: Claude Agent SDK vs Pydantic AI#
Since these are the two I'd actually consider, let's compare them properly with code.
Philosophy Difference#
Claude Agent SDK thinks in terms of capabilities: "Here are the tools Claude can use. Go."
Pydantic AI thinks in terms of contracts: "Here's the input type, output type, and dependencies. The framework enforces them."
Both are valid. The question is which mental model fits your team.
Side-by-Side: A Support Agent#
Let's build the same thing in both: an agent that looks up customer data and responds to queries.
Claude Agent SDK approach:
from claude_agent_sdk import (
tool, create_sdk_mcp_server, ClaudeAgentOptions,
ClaudeSDKClient, AssistantMessage, TextBlock
)
# Tools return unstructured content blocks
@tool("get_customer", "Fetch customer details", {"customer_id": int})
async def get_customer(args):
customer = await db.get_customer(args["customer_id"])
return {
"content": [{"type": "text", "text": f"Name: {customer.name}, Plan: {customer.plan}"}]
}
@tool("get_balance", "Get customer account balance", {"customer_id": int})
async def get_balance(args):
balance = await db.get_balance(args["customer_id"])
return {
"content": [{"type": "text", "text": f"Balance: ${balance:.2f}"}]
}
server = create_sdk_mcp_server("support", tools=[get_customer, get_balance])
options = ClaudeAgentOptions(
mcp_servers={"support": server},
allowed_tools=["mcp__support__get_customer", "mcp__support__get_balance"],
system_prompt="You are a bank support agent. Be helpful and concise."
)
async with ClaudeSDKClient(options=options) as client:
await client.query("What's my balance? Customer ID: 123")
async for message in client.receive_response():
if isinstance(message, AssistantMessage):
for block in message.content:
if isinstance(block, TextBlock):
print(block.text)
Pydantic AI approach:
from dataclasses import dataclass
from pydantic import BaseModel
from pydantic_ai import Agent, RunContext
# Dependencies are injected, not global
@dataclass
class SupportDeps:
customer_id: int
db: DatabaseConnection
# Output is a typed contract
class SupportResponse(BaseModel):
message: str
"""Response to show the customer"""
risk_level: int
"""1-10 risk assessment"""
action_required: bool
"""Whether human review is needed"""
agent = Agent(
'anthropic:claude-sonnet-4-5', # Model is swappable
deps_type=SupportDeps,
output_type=SupportResponse, # Enforced structure
instructions="You are a bank support agent. Assess risk level of each query."
)
# Tools receive typed context
@agent.tool
async def get_balance(ctx: RunContext[SupportDeps]) -> str:
"""Get the customer's current balance."""
balance = await ctx.deps.db.get_balance(ctx.deps.customer_id)
return f"${balance:.2f}"
@agent.tool
async def get_transaction_history(ctx: RunContext[SupportDeps], days: int) -> str:
"""Get recent transactions."""
txns = await ctx.deps.db.get_transactions(ctx.deps.customer_id, days)
return "\n".join(f"{t.date}: {t.description} ({t.amount})" for t in txns)
# Run with explicit dependencies
async def handle_query(customer_id: int, query: str):
deps = SupportDeps(customer_id=customer_id, db=get_db_connection())
result = await agent.run(query, deps=deps)
# result.output is guaranteed to be SupportResponse
if result.output.action_required:
await escalate_to_human(result.output)
return result.output.message
What Stands Out#
| Aspect | Claude Agent SDK | Pydantic AI |
|---|---|---|
| Output | Unstructured (you parse) | Typed Pydantic model (guaranteed) |
| Dependencies | Implicit (closures, globals) | Explicit injection (deps_type) |
| Tool context | args dict | Typed RunContext[YourDeps] |
| Model | Claude only | Any supported model |
| Built-in tools | File, Bash, Web, etc. | None (you build everything) |
| Validation | Manual or JSON schema | Automatic via Pydantic |
When Pydantic AI Wins#
-
You need model flexibility — Your contract says "GPT-4 for cost, Claude for quality." Pydantic AI makes this a config change.
-
You have strict output requirements — The response must be a specific shape. Pydantic AI enforces this; Claude SDK hopes the LLM complies.
-
Your team values dependency injection — If you already use DI patterns (FastAPI, pytest fixtures), Pydantic AI feels natural. Dependencies are explicit, testable, mockable.
-
You want output validation with retries — Pydantic AI has built-in
@agent.output_validatorthat can trigger retries:
@agent.output_validator
async def validate_response(ctx: RunContext[SupportDeps], output: SupportResponse) -> SupportResponse:
if output.risk_level > 8 and not output.action_required:
raise ModelRetry("High risk queries must have action_required=True")
return output
When Claude Agent SDK Wins#
-
You want batteries included — File operations, bash execution, web search, code editing—all built in. With Pydantic AI, you'd implement these yourself.
-
You're building on Claude Code patterns — Subagents, sessions, checkpointing, the Task tool. These are native to Claude's agentic model.
-
You want the latest Claude features immediately — Extended thinking, computer use, new tools. First-party SDK gets them first.
-
Simpler mental model for prototyping — Define tools, give Claude a prompt, let it figure out the rest. Less ceremony to get started.
The Hybrid Possibility#
Nothing stops you from using Pydantic for structured outputs while using Claude's API directly:
from pydantic import BaseModel
from anthropic import Anthropic
class AnalysisResult(BaseModel):
summary: str
risk_score: float
recommendations: list[str]
client = Anthropic()
response = client.messages.create(
model="claude-sonnet-4-5",
messages=[{"role": "user", "content": "Analyze this data..."}],
# Claude's native JSON mode
response_format={"type": "json_object"}
)
# Parse with Pydantic
result = AnalysisResult.model_validate_json(response.content[0].text)
You get Pydantic's validation without the full framework. Sometimes that's enough.
LangChain#
Use when: You need maximum flexibility and don't mind abstraction.
Strengths:
- Huge ecosystem, tons of integrations
- Model-agnostic by design
- Good for complex pipelines with many moving parts
Tradeoffs:
- Heavy abstraction can obscure what's happening
- Steeper learning curve
- Some find it over-engineered for simple use cases
My Take#
If you're already happy with Claude (you've read this far, so probably yes), the Agent SDK is the path of least resistance. It's the native way to go from "Claude in my terminal" to "Claude in my application."
If you need model flexibility or have strong opinions about software architecture, Pydantic AI is worth exploring—it emphasizes the patterns that make code maintainable.
I'd avoid LangChain unless you specifically need its ecosystem. The abstractions add complexity without proportional benefit for most use cases.
A Minimal Working Example#
Let's build something real: a simple agent that answers questions about a codebase.
import asyncio
from pathlib import Path
from claude_agent_sdk import (
ClaudeSDKClient, ClaudeAgentOptions,
AssistantMessage, TextBlock
)
async def ask_about_code(question: str, project_path: str):
"""Ask Claude a question about a codebase."""
options = ClaudeAgentOptions(
# Give Claude read access to explore the code
allowed_tools=["Read", "Glob", "Grep"],
# Point to the project
cwd=str(Path(project_path)),
# Keep it focused
system_prompt="You are a code exploration assistant. Answer questions concisely.",
max_turns=5,
)
print(f"Question: {question}\n")
async with ClaudeSDKClient(options=options) as client:
await client.query(question)
async for message in client.receive_response():
if isinstance(message, AssistantMessage):
for block in message.content:
if isinstance(block, TextBlock):
print(block.text)
# Run it
asyncio.run(ask_about_code(
question="How does the authentication system work?",
project_path="/path/to/your/project"
))
// TypeScript equivalent
import { query } from "@anthropic-ai/claude-agent-sdk";
async function askAboutCode(question: string, projectPath: string) {
console.log(`Question: ${question}\n`);
for await (const message of query({
prompt: question,
options: {
allowedTools: ["Read", "Glob", "Grep"],
cwd: projectPath,
systemPrompt:
"You are a code exploration assistant. Answer questions concisely.",
maxTurns: 5,
},
})) {
if (message.type === "assistant" && message.content) {
console.log(message.content);
}
}
}
askAboutCode(
"How does the authentication system work?",
"/path/to/your/project",
);
That's it. Claude will glob for relevant files, read them, grep for patterns, and synthesize an answer—all autonomously.
Cost Considerations: Haiku, Sonnet, Opus#
Anthropic offers three model tiers, and the Agent SDK lets you choose per-task:
| Model | Speed | Cost | Use Case |
|---|---|---|---|
| Haiku | Fastest | Lowest | Simple tool calls, classification, quick lookups |
| Sonnet | Balanced | Medium | Most agentic work, coding, analysis |
| Opus | Deepest | Highest | Complex reasoning, research, nuanced decisions |
A common pattern: use Sonnet for the main agent, Haiku for high-volume subagents:
// TypeScript
const options = {
model: "claude-sonnet-4-5", // Main agent
allowedTools: ["Read", "Task"],
agents: {
classifier: {
description: "Quick classification tasks",
prompt: "Classify inputs into categories.",
model: "haiku", // Cheaper for simple work
},
researcher: {
description: "Deep research requiring careful reasoning",
prompt: "Conduct thorough analysis.",
model: "opus", // Worth it for complex tasks
},
},
};
# Python
options = ClaudeAgentOptions(
model="claude-sonnet-4-5", # Main agent
allowed_tools=["Read", "Task"],
agents={
"classifier": AgentDefinition(
description="Quick classification tasks",
prompt="Classify inputs into categories.",
model="haiku" # Cheaper for simple work
),
"researcher": AgentDefinition(
description="Deep research requiring careful reasoning",
prompt="Conduct thorough analysis.",
model="opus" # Worth it for complex tasks
)
}
)
What I'd Build First#
If I were starting today, here's what I'd try:
-
Code review bot — Triggered on PR open, reviews changes, posts comments. Subagents for security, performance, style.
-
Documentation generator — Point it at a codebase, generate or update docs. Run weekly via cron.
-
Support ticket triage — Webhook from support system, agent categorizes and drafts responses, queues for human review.
-
Data pipeline monitor — Check logs, identify anomalies, create tickets with diagnosis. Runs on schedule.
All of these share a pattern: triggered by events, not humans typing. That's the sweet spot for the Agent SDK.
Key Takeaways#
-
Claude Code CLI vs Agent SDK is about orchestration. Human driving vs code driving. Both use Claude's capabilities.
-
Build your own agent when the trigger isn't a human, the user isn't you, the workflow is repeatable, or you need custom tools.
-
The Agent SDK gives you Claude Code's agentic loop as a library. Define tools, configure options, let Claude handle multi-step execution.
-
Framework choice depends on your constraints. Claude SDK for native access, Pydantic AI for type safety and model flexibility, LangChain for maximum ecosystem.
-
Start small. Take one repeatable workflow and automate it. The minimal example above is genuinely all you need to start.
The gap between "AI that answers questions" and "AI that does things autonomously" is smaller than it seems—but crossing it requires understanding when interactive isn't enough.
Resources#
Claude Agent SDK:
- Claude Agent SDK Documentation — Official docs with full API reference
- Agent SDK Python Package — GitHub repo with examples
- Agent SDK TypeScript Package — TypeScript version
Pydantic AI:
- Pydantic AI Documentation — Official docs with guides and examples
- Pydantic AI GitHub — Source code and issues
- Pydantic AI Examples — Real-world use cases
General:
- Model Context Protocol (MCP) — The protocol behind tool integration
- Anthropic Cookbook — Patterns and recipes for Claude