When to Build Your Own AI Agent (and When Claude Code Is Enough)

The Moment I Got Confused#

I was using Claude Code in my terminal, and it was working beautifully. I asked it to refactor a function. It read the file, understood the context, made the changes, ran the tests. Done.

Then someone mentioned the "Claude Agent SDK" and I thought: wait, isn't that what I'm already using?

The confusion is natural. Both involve Claude. Both can use tools. Both can perform multi-step tasks. So what's the difference? And more importantly: when would I actually need to build my own agent?

This article is the answer I wish I'd found.

What Is an Agent, Anyway?#

Strip away the hype, and an AI agent is surprisingly simple:

Agent = LLM + Tools + Loop

That's it. The LLM decides what to do. Tools let it take actions (read files, call APIs, run code). The loop keeps it going until the task is done.

┌─────────────────────────────────────┐
│            Agent Loop               │
│  ┌─────────┐    ┌─────────┐        │
│  │   LLM   │───▶│  Tools  │        │
│  │ (think) │◀───│  (act)  │        │
│  └─────────┘    └─────────┘        │
│        │              │             │
│        └──── loop ────┘             │
└─────────────────────────────────────┘

What makes this useful is the agent harness—the infrastructure that wraps around this loop. Think of it like an operating system for agents:

Component	Computer Analogy	Agent World
Model	CPU	Raw reasoning capability
Context window	RAM	Limited working memory
Agent harness	Operating System	Manages tools, context, sessions
Your agent	Application	Your specific logic and prompts

The harness handles the boring but critical stuff: initializing the model, managing tool calls, maintaining context across turns, handling errors, enforcing limits. You focus on what your agent should do.

Claude Code CLI is one harness. The Agent SDK lets you build your own. Pydantic AI is another approach entirely.

The Hidden Constraint: Context#

There's one thing the simple "LLM + Tools + Loop" formula hides: context windows are finite.

Every iteration of the loop adds to the context:

LLM decides to call a tool
Tool executes, returns result
Result gets ingested back into context
LLM reasons about the result, decides next action
Repeat

After 20, 50, 100 tool calls, the context gets cluttered. File contents, API responses, intermediate reasoning—it all piles up. Eventually you hit the limit.

This is why context engineering matters. Good agent harnesses handle this automatically:

Compactification: Summarizing older parts of the conversation while preserving what matters
Selective context: Only including relevant tool results, not everything
Checkpointing: Saving state externally so context can be reconstructed

Claude Code does compactification automatically—when context grows too large, it summarizes the conversation to stay within limits. The summary itself is generated by an LLM. It's LLMs all the way down.

For a deeper dive on harness architecture, Phil Schmid's Agent Harness article is worth reading—especially the insight that competitive advantage comes from the failure trajectories your harness captures, not from prompts alone.

The Core Mental Model: Who's Driving?#

The difference isn't about capability—it's about orchestration.

The Orchestration Spectrum

[Human Orchestrator]          ←→          [Code Orchestrator]

You type a prompt                          Your code triggers Claude
You see the result                         Your code handles the result
You decide what's next                     Your code decides what's next

Claude Code CLI                            Agent SDK
Interactive sessions                       Automated workflows
Development, exploration                   Production services

Claude Code CLI: You're in the driver's seat. You ask questions, Claude responds, you steer the conversation. The loop is:

You → Claude → You → Claude → You → ...

Agent SDK: Your software is in the driver's seat. Your code invokes Claude, handles responses programmatically, and decides the next steps. The loop becomes:

Event → Your Code → Claude → Your Code → Claude → Your Code → Output

The human might not be in the loop at all—or only at specific checkpoints you define.

When You Need the Agent SDK#

If Claude Code works great for you (it does for me), why would you ever build your own agent?

Build your own when:

The trigger isn't a human typing — A webhook fires, a cron job runs, a file appears in S3. Your code needs to invoke Claude without anyone sitting at a terminal.
The user isn't you — You're building a product. Your customers interact through your UI, not a terminal. You need to embed Claude's capabilities into your application.
The workflow is repeatable — You find yourself doing the same multi-step process over and over. Codify it once, run it automatically.
You need custom tools — Claude Code has great built-in tools, but your domain needs something specific: query your database, call your internal APIs, interact with your proprietary systems.
You want programmatic control — Cost limits, model selection, custom approval flows, structured outputs. You need fine-grained control that interactive sessions don't provide.

Stick with Claude Code when:

You're exploring, debugging, or doing ad-hoc work
You want to steer each step based on what you see
The task is unique and won't repeat
You are the user

What the Agent SDK Actually Gives You#

The SDK isn't a different Claude—it's Claude Code's capabilities, made programmable.

The Agentic Loop (Built-In)#

With the raw Claude API, you implement the tool loop yourself:

# Raw API: You handle the loop
response = client.messages.create(...)
while response.stop_reason == "tool_use":
    result = execute_tool(response.tool_use)  # You write this
    response = client.messages.create(tool_result=result, ...)

With the Agent SDK, Claude handles it:

# Agent SDK: Claude handles tools autonomously
from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions

options = ClaudeAgentOptions(allowed_tools=["Read", "Write", "Bash"])

async with ClaudeSDKClient(options=options) as client:
    await client.query("Fix the bug in auth.py")
    async for message in client.receive_response():
        print(message)  # Claude reads files, edits code, runs tests—automatically

// TypeScript uses the query() function directly
import { query } from "@anthropic-ai/claude-agent-sdk";

for await (const message of query({
  prompt: "Fix the bug in auth.py",
  options: { allowedTools: ["Read", "Write", "Bash"] },
})) {
  console.log(message);
}

That's the core value: autonomous multi-step execution without you managing the loop.

Custom Tools#

Define domain-specific capabilities with a simple decorator:

from claude_agent_sdk import (
    tool, create_sdk_mcp_server, ClaudeAgentOptions,
    ClaudeSDKClient, AssistantMessage, TextBlock
)

@tool("get_user", "Fetch user details from the database", {"user_id": str})
async def get_user(args):
    user = await db.fetch_user(args["user_id"])  # Your database
    return {
        "content": [{"type": "text", "text": f"User: {user.name}, Plan: {user.plan}"}]
    }

@tool("send_notification", "Send a notification to a user", {"user_id": str, "message": str})
async def send_notification(args):
    await notifications.send(args["user_id"], args["message"])  # Your service
    return {
        "content": [{"type": "text", "text": "Notification sent"}]
    }

# Create an MCP server with your tools
server = create_sdk_mcp_server(
    name="my-tools",
    version="1.0.0",
    tools=[get_user, send_notification]
)

# Run the agent with your tools available
options = ClaudeAgentOptions(
    mcp_servers={"app": server},
    allowed_tools=["mcp__app__get_user", "mcp__app__send_notification"]
)

async with ClaudeSDKClient(options=options) as client:
    await client.query("Check if user 123 is on the free plan and remind them about the premium trial")

    async for message in client.receive_response():
        if isinstance(message, AssistantMessage):
            for block in message.content:
                if isinstance(block, TextBlock):
                    print(block.text)

Claude decides when to call your tools, in what order, and how to combine results. You just define what's possible.

Subagents#

Delegate specialized tasks to focused agents:

from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions, AgentDefinition

options = ClaudeAgentOptions(
    allowed_tools=["Read", "Grep", "Glob", "Task"],  # Task enables subagents
    agents={
        "code-reviewer": AgentDefinition(
            description="Expert code reviewer. Use for security and quality checks.",
            prompt="""You are a code review specialist. Focus on:
- Security vulnerabilities
- Performance issues
- Code quality and maintainability""",
            tools=["Read", "Grep", "Glob"],  # Read-only access
            model="sonnet"
        ),
        "test-runner": AgentDefinition(
            description="Runs tests and analyzes results.",
            prompt="Run tests and provide clear analysis of failures.",
            tools=["Bash", "Read", "Grep"]  # Can execute commands
        )
    }
)

async with ClaudeSDKClient(options=options) as client:
    await client.query("Review the auth module for security issues, then run the test suite")
    async for message in client.receive_response():
        print(message)

// TypeScript equivalent
import { query } from "@anthropic-ai/claude-agent-sdk";

for await (const message of query({
  prompt: "Review the auth module for security issues, then run the test suite",
  options: {
    allowedTools: ["Read", "Grep", "Glob", "Task"],
    agents: {
      "code-reviewer": {
        description:
          "Expert code reviewer. Use for security and quality checks.",
        prompt: `You are a code review specialist...`,
        tools: ["Read", "Grep", "Glob"],
        model: "sonnet",
      },
      "test-runner": {
        description: "Runs tests and analyzes results.",
        prompt: "Run tests and provide clear analysis of failures.",
        tools: ["Bash", "Read", "Grep"],
      },
    },
  },
})) {
  console.log(message);
}

The main agent orchestrates, delegating to specialists as needed.

Programmatic Controls#

Fine-grained configuration for production use:

// TypeScript - uses camelCase
const options = {
  // Cost control
  maxTurns: 10, // Limit agent iterations
  maxBudgetUsd: 5.0, // Stop after spending $5

  // Model selection
  model: "claude-sonnet-4-5",
  fallbackModel: "claude-haiku-4", // Cheaper fallback

  // Permissions
  permissionMode: "acceptEdits", // Auto-accept file changes
  allowedTools: ["Read", "Write"], // Whitelist tools
  disallowedTools: ["Bash"], // Blacklist dangerous ones

  // Context
  systemPrompt: "You are a Python specialist. Follow PEP 8.",
  cwd: "/path/to/project",

  // Structured output (for parsing)
  outputFormat: {
    type: "json_schema",
    schema: {
      type: "object",
      properties: {
        /* ... */
      },
    },
  },
};

# Python - uses snake_case
options = ClaudeAgentOptions(
    max_turns=10,
    max_budget_usd=5.0,
    model="claude-sonnet-4-5",
    fallback_model="claude-haiku-4",
    permission_mode="acceptEdits",
    allowed_tools=["Read", "Write"],
    disallowed_tools=["Bash"],
    system_prompt="You are a Python specialist. Follow PEP 8.",
    cwd="/path/to/project",
    output_format={
        "type": "json_schema",
        "schema": {"type": "object", "properties": {}}
    }
)

The Framework Landscape#

The Agent SDK isn't the only option. Here's how it compares:

Framework	Provider	Models	Philosophy
Claude Agent SDK	Anthropic	Claude only	Native, minimal abstraction
Pydantic AI	Third-party	Many LLMs	Type-safe, dependency injection
LangChain	Third-party	Many LLMs	Heavy abstraction, flexible

Claude Agent SDK#

Use when: You're committed to Claude and want native access to its features.

Strengths:

First-party: features ship as Claude ships them
No abstraction overhead—direct access to everything
Built-in tools from Claude Code (file ops, bash, web search)
Streaming, sessions, checkpointing out of the box

Tradeoffs:

Claude-only. If you need GPT-4 or Gemini, you can't swap.
Smaller ecosystem than general-purpose frameworks.

Pydantic AI#

Use when: You want type safety, dependency injection, and model flexibility.

Strengths:

Excellent software engineering practices
Swap models easily (Claude today, GPT-4 tomorrow)
Clean, Pythonic API with Pydantic validation
Good for teams with strong typing culture

Tradeoffs:

You implement tools from scratch (no built-in file/bash tools)
Third-party, so Claude-specific features may lag

Deep Dive: Claude Agent SDK vs Pydantic AI#

Since these are the two I'd actually consider, let's compare them properly with code.

Philosophy Difference#

Claude Agent SDK thinks in terms of capabilities: "Here are the tools Claude can use. Go."

Pydantic AI thinks in terms of contracts: "Here's the input type, output type, and dependencies. The framework enforces them."

Both are valid. The question is which mental model fits your team.

Side-by-Side: A Support Agent#

Let's build the same thing in both: an agent that looks up customer data and responds to queries.

Claude Agent SDK approach:

from claude_agent_sdk import (
    tool, create_sdk_mcp_server, ClaudeAgentOptions,
    ClaudeSDKClient, AssistantMessage, TextBlock
)

# Tools return unstructured content blocks
@tool("get_customer", "Fetch customer details", {"customer_id": int})
async def get_customer(args):
    customer = await db.get_customer(args["customer_id"])
    return {
        "content": [{"type": "text", "text": f"Name: {customer.name}, Plan: {customer.plan}"}]
    }

@tool("get_balance", "Get customer account balance", {"customer_id": int})
async def get_balance(args):
    balance = await db.get_balance(args["customer_id"])
    return {
        "content": [{"type": "text", "text": f"Balance: ${balance:.2f}"}]
    }

server = create_sdk_mcp_server("support", tools=[get_customer, get_balance])

options = ClaudeAgentOptions(
    mcp_servers={"support": server},
    allowed_tools=["mcp__support__get_customer", "mcp__support__get_balance"],
    system_prompt="You are a bank support agent. Be helpful and concise."
)

async with ClaudeSDKClient(options=options) as client:
    await client.query("What's my balance? Customer ID: 123")
    async for message in client.receive_response():
        if isinstance(message, AssistantMessage):
            for block in message.content:
                if isinstance(block, TextBlock):
                    print(block.text)

Pydantic AI approach:

from dataclasses import dataclass
from pydantic import BaseModel
from pydantic_ai import Agent, RunContext

# Dependencies are injected, not global
@dataclass
class SupportDeps:
    customer_id: int
    db: DatabaseConnection

# Output is a typed contract
class SupportResponse(BaseModel):
    message: str
    """Response to show the customer"""
    risk_level: int
    """1-10 risk assessment"""
    action_required: bool
    """Whether human review is needed"""

agent = Agent(
    'anthropic:claude-sonnet-4-5',  # Model is swappable
    deps_type=SupportDeps,
    output_type=SupportResponse,  # Enforced structure
    instructions="You are a bank support agent. Assess risk level of each query."
)

# Tools receive typed context
@agent.tool
async def get_balance(ctx: RunContext[SupportDeps]) -> str:
    """Get the customer's current balance."""
    balance = await ctx.deps.db.get_balance(ctx.deps.customer_id)
    return f"${balance:.2f}"

@agent.tool
async def get_transaction_history(ctx: RunContext[SupportDeps], days: int) -> str:
    """Get recent transactions."""
    txns = await ctx.deps.db.get_transactions(ctx.deps.customer_id, days)
    return "\n".join(f"{t.date}: {t.description} ({t.amount})" for t in txns)

# Run with explicit dependencies
async def handle_query(customer_id: int, query: str):
    deps = SupportDeps(customer_id=customer_id, db=get_db_connection())
    result = await agent.run(query, deps=deps)

    # result.output is guaranteed to be SupportResponse
    if result.output.action_required:
        await escalate_to_human(result.output)
    return result.output.message

What Stands Out#

Aspect	Claude Agent SDK	Pydantic AI
Output	Unstructured (you parse)	Typed Pydantic model (guaranteed)
Dependencies	Implicit (closures, globals)	Explicit injection (`deps_type`)
Tool context	`args` dict	Typed `RunContext[YourDeps]`
Model	Claude only	Any supported model
Built-in tools	File, Bash, Web, etc.	None (you build everything)
Validation	Manual or JSON schema	Automatic via Pydantic

When Pydantic AI Wins#

You need model flexibility — Your contract says "GPT-4 for cost, Claude for quality." Pydantic AI makes this a config change.
You have strict output requirements — The response must be a specific shape. Pydantic AI enforces this; Claude SDK hopes the LLM complies.
Your team values dependency injection — If you already use DI patterns (FastAPI, pytest fixtures), Pydantic AI feels natural. Dependencies are explicit, testable, mockable.
You want output validation with retries — Pydantic AI has built-in @agent.output_validator that can trigger retries:

@agent.output_validator
async def validate_response(ctx: RunContext[SupportDeps], output: SupportResponse) -> SupportResponse:
    if output.risk_level > 8 and not output.action_required:
        raise ModelRetry("High risk queries must have action_required=True")
    return output

When Claude Agent SDK Wins#

You want batteries included — File operations, bash execution, web search, code editing—all built in. With Pydantic AI, you'd implement these yourself.
You're building on Claude Code patterns — Subagents, sessions, checkpointing, the Task tool. These are native to Claude's agentic model.
You want the latest Claude features immediately — Extended thinking, computer use, new tools. First-party SDK gets them first.
Simpler mental model for prototyping — Define tools, give Claude a prompt, let it figure out the rest. Less ceremony to get started.

The Hybrid Possibility#

Nothing stops you from using Pydantic for structured outputs while using Claude's API directly:

from pydantic import BaseModel
from anthropic import Anthropic

class AnalysisResult(BaseModel):
    summary: str
    risk_score: float
    recommendations: list[str]

client = Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Analyze this data..."}],
    # Claude's native JSON mode
    response_format={"type": "json_object"}
)

# Parse with Pydantic
result = AnalysisResult.model_validate_json(response.content[0].text)

You get Pydantic's validation without the full framework. Sometimes that's enough.

LangChain#

Use when: You need maximum flexibility and don't mind abstraction.

Strengths:

Huge ecosystem, tons of integrations
Model-agnostic by design
Good for complex pipelines with many moving parts

Tradeoffs:

Heavy abstraction can obscure what's happening
Steeper learning curve
Some find it over-engineered for simple use cases

My Take#

If you're already happy with Claude (you've read this far, so probably yes), the Agent SDK is the path of least resistance. It's the native way to go from "Claude in my terminal" to "Claude in my application."

If you need model flexibility or have strong opinions about software architecture, Pydantic AI is worth exploring—it emphasizes the patterns that make code maintainable.

I'd avoid LangChain unless you specifically need its ecosystem. The abstractions add complexity without proportional benefit for most use cases.

A Minimal Working Example#

Let's build something real: a simple agent that answers questions about a codebase.

import asyncio
from pathlib import Path
from claude_agent_sdk import (
    ClaudeSDKClient, ClaudeAgentOptions,
    AssistantMessage, TextBlock
)

async def ask_about_code(question: str, project_path: str):
    """Ask Claude a question about a codebase."""

    options = ClaudeAgentOptions(
        # Give Claude read access to explore the code
        allowed_tools=["Read", "Glob", "Grep"],

        # Point to the project
        cwd=str(Path(project_path)),

        # Keep it focused
        system_prompt="You are a code exploration assistant. Answer questions concisely.",
        max_turns=5,
    )

    print(f"Question: {question}\n")

    async with ClaudeSDKClient(options=options) as client:
        await client.query(question)

        async for message in client.receive_response():
            if isinstance(message, AssistantMessage):
                for block in message.content:
                    if isinstance(block, TextBlock):
                        print(block.text)

# Run it
asyncio.run(ask_about_code(
    question="How does the authentication system work?",
    project_path="/path/to/your/project"
))

// TypeScript equivalent
import { query } from "@anthropic-ai/claude-agent-sdk";

async function askAboutCode(question: string, projectPath: string) {
  console.log(`Question: ${question}\n`);

  for await (const message of query({
    prompt: question,
    options: {
      allowedTools: ["Read", "Glob", "Grep"],
      cwd: projectPath,
      systemPrompt:
        "You are a code exploration assistant. Answer questions concisely.",
      maxTurns: 5,
    },
  })) {
    if (message.type === "assistant" && message.content) {
      console.log(message.content);
    }
  }
}

askAboutCode(
  "How does the authentication system work?",
  "/path/to/your/project",
);

That's it. Claude will glob for relevant files, read them, grep for patterns, and synthesize an answer—all autonomously.

Cost Considerations: Haiku, Sonnet, Opus#

Anthropic offers three model tiers, and the Agent SDK lets you choose per-task:

Model	Speed	Cost	Use Case
Haiku	Fastest	Lowest	Simple tool calls, classification, quick lookups
Sonnet	Balanced	Medium	Most agentic work, coding, analysis
Opus	Deepest	Highest	Complex reasoning, research, nuanced decisions

A common pattern: use Sonnet for the main agent, Haiku for high-volume subagents:

// TypeScript
const options = {
  model: "claude-sonnet-4-5", // Main agent
  allowedTools: ["Read", "Task"],
  agents: {
    classifier: {
      description: "Quick classification tasks",
      prompt: "Classify inputs into categories.",
      model: "haiku", // Cheaper for simple work
    },
    researcher: {
      description: "Deep research requiring careful reasoning",
      prompt: "Conduct thorough analysis.",
      model: "opus", // Worth it for complex tasks
    },
  },
};

# Python
options = ClaudeAgentOptions(
    model="claude-sonnet-4-5",  # Main agent
    allowed_tools=["Read", "Task"],
    agents={
        "classifier": AgentDefinition(
            description="Quick classification tasks",
            prompt="Classify inputs into categories.",
            model="haiku"  # Cheaper for simple work
        ),
        "researcher": AgentDefinition(
            description="Deep research requiring careful reasoning",
            prompt="Conduct thorough analysis.",
            model="opus"  # Worth it for complex tasks
        )
    }
)

What I'd Build First#

If I were starting today, here's what I'd try:

Code review bot — Triggered on PR open, reviews changes, posts comments. Subagents for security, performance, style.
Documentation generator — Point it at a codebase, generate or update docs. Run weekly via cron.
Support ticket triage — Webhook from support system, agent categorizes and drafts responses, queues for human review.
Data pipeline monitor — Check logs, identify anomalies, create tickets with diagnosis. Runs on schedule.

All of these share a pattern: triggered by events, not humans typing. That's the sweet spot for the Agent SDK.

Key Takeaways#

Claude Code CLI vs Agent SDK is about orchestration. Human driving vs code driving. Both use Claude's capabilities.
Build your own agent when the trigger isn't a human, the user isn't you, the workflow is repeatable, or you need custom tools.
The Agent SDK gives you Claude Code's agentic loop as a library. Define tools, configure options, let Claude handle multi-step execution.
Framework choice depends on your constraints. Claude SDK for native access, Pydantic AI for type safety and model flexibility, LangChain for maximum ecosystem.
Start small. Take one repeatable workflow and automate it. The minimal example above is genuinely all you need to start.

The gap between "AI that answers questions" and "AI that does things autonomously" is smaller than it seems—but crossing it requires understanding when interactive isn't enough.

Resources#

Claude Agent SDK:

Claude Agent SDK Documentation — Official docs with full API reference
Agent SDK Python Package — GitHub repo with examples
Agent SDK TypeScript Package — TypeScript version

Pydantic AI:

Pydantic AI Documentation — Official docs with guides and examples
Pydantic AI GitHub — Source code and issues
Pydantic AI Examples — Real-world use cases

General:

Model Context Protocol (MCP) — The protocol behind tool integration
Anthropic Cookbook — Patterns and recipes for Claude