Agent Harness Explained: The Runtime That Makes AI Agents Actually Work

Agent Harness - the runtime that makes agents work, with LLM, Tools, Skills, MCP, Memory, and Hooks orbiting a central hexagon

Every agent architecture article talks about LLMs, tools, and skills. Nobody talks about the thing that actually makes them work: the harness. It’s the runtime that takes your prompt, decides what tools to call, manages context, handles errors, and returns a result. Without it, you have components. With it, you have an agent.

This is the third layer in the agent stack. We covered MCP (the protocol layer) and Agent Skills (the capability layer). Now we’re covering the runtime that ties them together — and explaining why every production agent needs one.

Published: June 30, 2026
Tech Stack:
Claude Agent SDK TypeScript Python MCP Protocol

What You’ll Learn

What an Agent Harness Is

The runtime that turns components into an agent.
The Agent Loop

Receive → Context → LLM → Tools → Return.
Harness Capabilities

Context, permissions, sessions, hooks, subagents.
Build Your First Agent

Working code with Claude Agent SDK.

TL;DR

An Agent Harness is the runtime that orchestrates an AI agent: it receives prompts, loads context (system prompts, Skills, conversation history), calls the LLM, executes tools (MCP servers, scripts, file operations), handles errors, manages permissions, and returns results. It’s the operating system for AI agents.

Claude has two main harness options: the Claude Agent SDK (a library that runs the agent loop in your process) and Managed Agents (a hosted REST API that runs the agent in Anthropic’s infrastructure). The Claude Code CLI uses the same harness internally for interactive development.

Key capabilities include: agent loop management, context loading, tool execution, permission system, session management, hooks for custom code, and subagents for delegation. The harness sits on top of MCP (the protocol) and Skills (the capabilities) to produce a working agent.

If MCP is the transport and Skills are the expertise, the Harness is where it all runs. Most production agents need a harness by month three — building one from scratch is possible, but using a battle-tested one is almost always the right choice.

The Problem: Components Don’t Make an Agent

Here’s where most teams get stuck. They’ve got:

An LLM API (Claude, GPT, Gemini)
A list of MCP servers that expose tools
A set of Agent Skills for domain expertise

They wire them together with a simple loop: “call the LLM, see if it wants to use a tool, call the tool, call the LLM again.” It works for a demo. It breaks in production.

Why? Because the simple loop doesn’t handle:

Context overflow — Long conversations blow past the context window
Error recovery — A failed tool call kills the whole agent
Permission control — The agent can do anything the tools allow
Session persistence — Every conversation starts from zero
Observability — No way to see what the agent actually did
Cost control — No way to cap spending or interrupt runaway loops

An Agent Harness is the solution. It handles all of this so you can focus on capabilities, not plumbing.

Think of the harness as the operating system for AI agents. The LLM is the CPU, MCP servers are the peripherals, and Skills are the applications. The harness is what makes them all work together reliably.

The Agent Stack: Three Layers

Modern AI agent architecture is three distinct layers. Each one solves a different problem.

The agent stack showing three layers: Protocol (MCP), Capability (Skills), and Runtime (Harness)

Layer 1: Protocol — MCP

MCP (Model Context Protocol) is the transport layer. It standardises how agents connect to tools, data, and external systems. Without MCP, every agent-tool connection is custom code. With MCP, you build a server once and any agent can use it.

Layer 2: Capability — Agent Skills

Agent Skills are the expertise layer. They package domain knowledge (instructions, scripts, reference docs) into folders the agent can discover and load. Without Skills, you have a smart agent that doesn’t know your work. With Skills, you have an expert.

Layer 3: Runtime — Agent Harness

The harness is where it all runs. It receives prompts, loads context, calls the LLM, executes tools, handles errors, manages permissions, maintains sessions, and returns results. Without a harness, you have components. With one, you have an agent.

The architecture: MCP = transport, Skills = expertise, Harness = runtime. They compose. You need all three for production agents.

The Agent Loop

At the core of every harness is the agent loop: a repeating cycle that turns a prompt into a result. The Claude Agent SDK runs this loop automatically. Understanding it helps you reason about agent behaviour and debug issues.

The loop runs in five steps. Each iteration, the harness:

Receives the prompt — Captures the user’s request and creates a new session
Loads context — System prompt, Skill metadata, conversation history, tool definitions
Calls the LLM — Sends the assembled context to Claude and waits for a response
Executes tools — If the LLM requested tool calls, runs them with permission checks and hooks
Returns the result — Sends tool results back to the LLM, which either continues the loop or produces a final response

The Loop Is Where Quality Happens

The loop seems simple, but every production agent has a sophisticated harness around it. Error recovery, retries, permission checks, hooks, context management, session persistence — these are the features that turn a demo into a production system.

What a Harness Does

A production Agent Harness handles eight core responsibilities. Each one is a feature you’d otherwise have to build yourself.

Eight harness capabilities as a numbered grid: Agent Loop, Context Mgmt, Tool Execution, Permission, Sessions, Hooks, Subagents, Error Recovery

Here’s what each one does in production:

1. Agent Loop Management

The basic receive → context → LLM → tools → return cycle. The harness runs this loop until the LLM produces a final response, hits a max-iteration limit, or errors out.

2. Context Management

Loading the right context at the right time: Skill metadata at startup, Skill body when triggered, referenced files on demand, tool results from previous turns. The harness handles progressive disclosure automatically.

3. Tool Execution

Running MCP tools, scripts, bash commands, file operations, and any other capability the agent needs. The harness handles async coordination, result capture, and error propagation.

4. Permission System

Pre-approving safe operations (Read, Glob, Grep), blocking dangerous ones (rm -rf), and requiring user approval for sensitive actions. Without this, your agent can do anything the tools allow — including things you didn’t intend.

5. Session Management

Maintaining context across multiple exchanges, capturing session IDs, and supporting resume or fork operations. Without this, every conversation starts from zero.

6. Hooks

Running custom code at key points in the agent lifecycle: PreToolUse (before a tool runs), PostToolUse (after a tool runs), Stop (when the agent finishes), SessionStart, SessionEnd, UserPromptSubmit.

7. Subagents

Spawning specialised agents for focused subtasks. Your main agent delegates work to subagents (like “code-reviewer” or “test-runner”) and gets results back. Subagents can have different tools, prompts, and permissions.

8. Error Recovery

Retrying transient failures (network timeouts, rate limits), surfacing permanent failures with context (so the LLM can adapt), and preventing runaway loops (max iterations, cost caps).

When to Use a Harness

Not every AI use case needs a harness. Here’s how we decide.

Use a Harness When

Multi-step tasks — The agent decides tool order dynamically based on intermediate results
Error-prone workflows — You need retries, fallbacks, and audit logs to trust the agent
Tool-heavy integrations — You’re combining 5+ tools, MCP servers, and file operations
Team collaboration — Multiple agents need to share context or work together
Production deployment — You need sessions, permissions, monitoring, and reliability
Time to market — You want to focus on capabilities, not plumbing

Roll Your Own When

Simple chat — You just need an LLM with no tools, no decisions, no orchestration
Fixed workflows — The same steps every time, no dynamic decision-making
Single tool — One API call wrapped in a chat interface
Learning — You’re trying to understand the fundamentals
Latency-critical — Every millisecond matters and you can’t afford harness overhead
Cost-sensitive — Bare API calls are cheapest; harnesses add token overhead

Start Simple, Scale Up

Most production agents need a harness by month three. You can start with a simple loop and add harness features as you hit the walls: “we need retries”, “we need permissions”, “we need sessions”, “we need observability”. The Claude Agent SDK gives you all of these from day one, which is why we recommend it for most teams.

Build Your First Agent with Claude Agent SDK

Let’s build a real agent. The Claude Agent SDK is available in Python and TypeScript. Here’s a working example that reads files, searches code, and runs commands.

Python

import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions

async def main():
    async for message in query(
        prompt="Find and fix the bug in auth.py",
        options=ClaudeAgentOptions(
            allowed_tools=["Read", "Edit", "Bash", "Grep"],
            permission_mode="acceptEdits"
        )
    ):
        if hasattr(message, 'result'):
            print(message.result)

asyncio.run(main())

TypeScript

import { query } from '@anthropic-ai/claude-agent-sdk';

for await (const message of query({
  prompt: 'Find and fix the bug in auth.ts',
  options: {
    allowedTools: ['Read', 'Edit', 'Bash', 'Grep'],
    permissionMode: 'acceptEdits'
  }
})) {
  if ('result' in message) console.log(message.result);
}

What the SDK Gives You

That 15-line script gives you a production-grade agent with:

Agent loop — Automatic receive → context → LLM → tools → return
Tool execution — Read, Edit, Bash, Grep all work out of the box
Permissions — acceptEdits mode pre-approves file modifications
Session management — Capture the session ID to resume later
Error recovery — Transient failures retry automatically
Observability — Every action is logged with full context

Compare that to writing a loop yourself. Even the simple example above (loop until stop_reason != tool_use) is 20-30 lines. The error handling, permissions, and logging would push it to 200+.

Harness Options: SDK vs Managed vs CLI

Claude has three main harness options. Each one fits a different use case.

Claude Agent SDK

A library that runs the agent loop in your process. You provide the infrastructure, the SDK provides the harness.

Best for: Local prototyping, agents on your infrastructure, custom integrations, CI/CD pipelines
Pros: Full control over the agent loop, works on your filesystem, custom tools in Python/TS, free to use (API costs only)
Cons: You manage the infrastructure, no managed sandbox, you handle scaling

Managed Agents

A hosted REST API. Anthropic runs the agent and the sandbox; your application sends events and streams back results.

Best for: Production agents without ops overhead, long-running sessions, async workflows
Pros: Anthropic runs the sandbox, production-ready out of the box, async sessions, managed scaling
Cons: Less control over environment, vendor lock-in, higher per-session cost

Claude Code CLI

The interactive development tool. Uses the same harness as the SDK, but with a terminal interface instead of a programmatic API.

Best for: Daily development, one-off tasks, learning agent patterns
Pros: Same capabilities as SDK, great for local dev, file-based config, Skills support
Cons: Not for production automation, no programmatic API, manual session management

Our Recommendation

Start with the Claude Agent SDK for prototyping and learning. Move to Managed Agents when you need production reliability without ops overhead. Use the CLI for daily development. Most teams end up using all three.

Advanced Harness Patterns

Once you’ve got a basic agent working, the harness enables patterns that would be impossible to build from scratch.

Subagents for Specialisation

Spawn focused agents for subtasks. Your main agent delegates work to specialists:

agents={
  "code-reviewer": AgentDefinition(
    description="Expert code reviewer for quality and security reviews.",
    prompt="Analyze code quality and suggest improvements.",
    tools=["Read", "Glob", "Grep"]
  ),
  "test-runner": AgentDefinition(
    description="Runs tests and reports results.",
    prompt="Execute tests, parse output, report failures.",
    tools=["Bash", "Read"]
  )
}

Hooks for Observability

Run custom code at every key point in the agent lifecycle. Use hooks for logging, validation, or side effects:

async def log_file_change(input_data, tool_use_id, context):
    file_path = input_data.get("tool_input", {}).get("file_path", "unknown")
    with open("./audit.log", "a") as f:
        f.write(f"{datetime.now()}: modified {file_path}\n")
    return {}

hooks={
  "PostToolUse": [
    HookMatcher(matcher="Edit|Write", hooks=[log_file_change])
  ]
}

Sessions for Continuity

Capture the session ID from the first query, then resume with full context:

session_id = None

async for message in query(
    prompt="Read the authentication module",
    options=ClaudeAgentOptions(allowed_tools=["Read", "Glob"])
):
    if isinstance(message, SystemMessage) and message.subtype == "init":
        session_id = message.data["session_id"]

# Resume with full context
async for message in query(
    prompt="Now find all places that call it",
    options=ClaudeAgentOptions(resume=session_id)
):
    pass

The Bigger Picture: Why This Matters

Agent Harnesses are the missing layer in most AI architectures. Teams build an LLM wrapper, wire up a few tools, and call it an “agent.” Then they hit the walls: errors, permissions, sessions, observability. They patch each one. Six months later, they’ve built a worse version of the Claude Agent SDK.

The alternative is to start with a battle-tested harness. Focus on the capabilities that make your agent useful — the Skills, the MCP servers, the domain logic. Let the harness handle the plumbing.

The pattern: Don’t build infrastructure you can adopt. Use the Claude Agent SDK, build Skills on top, deploy MCP servers for tools. Focus on what makes your agent your agent — the expertise, the integrations, the workflows. Leave the rest to the harness.

For South African teams, this matters even more. Building agent infrastructure from scratch is expensive. Building agent capabilities on top of a proven harness is fast. The teams that adopt first will have a structural advantage — they ship features instead of fixing plumbing.

Final Thoughts

Most agent architecture articles focus on the LLM, the tools, and the Skills. Those are the visible parts — the parts that look impressive in a demo. The harness is the invisible part. The plumbing. The thing nobody talks about until it breaks.

But the harness is what makes production agents possible. Error recovery. Permission control. Session persistence. Observability. Hooks for custom code. Subagents for delegation. These aren’t features you build once and forget. They’re features you need from day one and refine forever.

The Claude Agent SDK gives you all of this. It’s the runtime we use for our own production agents. It’s what we’d build if we were starting from scratch. Use it, and focus on the parts that actually differentiate your agent — the Skills, the integrations, the domain expertise.

The best agent infrastructure is the kind you don’t have to build — so you can focus on the agent.

What’s Next

The Agent Harness ecosystem is evolving. We’re tracking:

Managed Agents GA — Moving from preview to generally available
More harness features — Better error recovery, cost controls, audit logging
Skill-harness integration — Deeper patterns for Skills in production agents
Multi-agent orchestration — Coordinating multiple harnesses for complex workflows

For South African teams specifically, we see two big opportunities:

Local-first production agents — Run the SDK on your own infrastructure, keep data in-country
Managed scale-out — Prototype locally, deploy to Managed Agents for production

Both patterns are early. The teams that start now will own the deployment patterns the rest of the market follows.

Ready to Build Production Agents?

We help South African teams design, build, and deploy production AI agents using the Claude Agent SDK. From architecture design to Skills development to MCP server integration, we handle the agent layer so you can focus on the capabilities.

AI Architecture Consulting
AI Development Services
Contact Us

References

Claude Agent SDK Overview — Official documentation
Agent SDK Quickstart — Build your first agent in minutes
Managed Agents Overview — Hosted agent runtime
Building Effective Agents — Anthropic’s patterns for production agents
Agent SDK Demos — Example agents built with the SDK

What You’ll Learn

What an Agent Harness Is

The Agent Loop

Harness Capabilities

Build Your First Agent

TL;DR

The Problem: Components Don’t Make an Agent

The Agent Stack: Three Layers

Layer 1: Protocol — MCP

Layer 2: Capability — Agent Skills

Layer 3: Runtime — Agent Harness

The Agent Loop

The Loop Is Where Quality Happens

What a Harness Does

1. Agent Loop Management

2. Context Management

3. Tool Execution

4. Permission System

5. Session Management

6. Hooks

7. Subagents

8. Error Recovery

When to Use a Harness

Use a Harness When

Roll Your Own When

Start Simple, Scale Up

Build Your First Agent with Claude Agent SDK

Python

TypeScript

What the SDK Gives You

Harness Options: SDK vs Managed vs CLI

Claude Agent SDK

Managed Agents

Claude Code CLI

Our Recommendation

Advanced Harness Patterns

Subagents for Specialisation

Hooks for Observability

Sessions for Continuity

The Bigger Picture: Why This Matters

Final Thoughts

What’s Next

Ready to Build Production Agents?

Related Reading

References

About Nemesis

Leave a Reply Cancel reply