The Python SDK — Building AI-Powered Tools
Go beyond the CLI. Build custom AI-powered Python tools with full async streaming in under 50 lines of code.
Why Use the SDK?
The CLI is great for interactive use, but real-world applications need programmatic access. The Harness SDK gives you a fully async streaming Python interface — so you can integrate AI agent capabilities directly into scripts, pipelines, and services.
Claude Code and Cursor are CLI/IDE only. Aider has no streaming SDK. Harness gives you full async streaming — build custom AI tools in Python, integrate with your pipelines, and react to every event in real time.
With the SDK you can:
- React to individual text chunks as they stream
- Intercept and log every tool call the agent makes
- Capture precise cost and token data per session
- Switch providers and models programmatically
- Wire in custom approval logic before any tool executes
Your First SDK Script
Create this file and run it — you'll see the agent's response stream token by token, then a cost summary at the end.
# hello_harness.py
import asyncio
import harness
async def main():
async for message in harness.run("What is 2 + 2?"):
if isinstance(message, harness.TextMessage) and not message.is_partial:
print(message.text)
elif isinstance(message, harness.Result):
print(f"\nDone! Tokens: {message.total_tokens}, Cost: ${message.total_cost:.4f}")
asyncio.run(main())
uv run hello_harness.py
2 + 2 = 4
Done! Tokens: 142, Cost: $0.0002
harness.run() yields every streaming chunk as a TextMessage
with is_partial=True. The final, complete message has is_partial=False.
Checking not message.is_partial lets you skip partial chunks and only
process complete assistant turns. For a live typing effect, iterate partial chunks instead.
Message Types
harness.run() is an async generator that yields a union of message types.
Each type represents a different event in the agent's execution lifecycle.
Message = TextMessage | ToolUse | ToolResult | Result | CompactionEvent | SystemEvent
is_partial=True for in-flight chunks, False for the complete turn.name (tool name) and args (dict of arguments).content and is_error to distinguish failures.text, turns, tool_calls, total_tokens, total_cost, session_id.Every message type is a Python dataclass. Use isinstance() checks or Python 3.10+ match statements for exhaustive, type-safe handling.
Pattern Matching
Python 3.10+ structural pattern matching is the cleanest way to handle the message union.
Each case extracts fields directly from the matched type.
async for msg in harness.run("Refactor utils.py"):
match msg:
case harness.TextMessage(text=t, is_partial=False):
print(f"Agent: {t}")
case harness.ToolUse(name=name, args=args):
print(f"Tool call: {name}({args})")
case harness.ToolResult(content=c, is_error=True):
print(f"Error: {c}")
case harness.Result() as r:
print(f"Done in {r.turns} turns, {r.tool_calls} tool calls")
print(f"Cost: ${r.total_cost:.4f}")
Replace "Refactor utils.py" with any task and point it at a real file in your project. Watch the ToolUse events fire as the agent reads and edits files.
Build an AI Security Reviewer
Let's build something real: a security code reviewer that scans a file for vulnerabilities and outputs a structured report. First, create the file to review:
# review_target.py — can you spot the security issues?
import pickle
import subprocess
SECRET_KEY = "hardcoded_password_123" # Security issue: hardcoded secret
def load_data(filepath):
with open(filepath, "rb") as f:
return pickle.load(f) # Security issue: arbitrary code execution
def process_input(user_input):
result = eval(user_input) # Security issue: code injection
return result
def run_command(cmd):
return subprocess.run(cmd, shell=True, capture_output=True) # Security issue: command injection
def get_user(user_id):
query = f"SELECT * FROM users WHERE id = {user_id}" # Security issue: SQL injection
return query
Hardcoded secret, pickle deserialization (arbitrary code execution), eval injection, command injection via shell=True, and SQL injection. The agent will find all five.
Now create the reviewer script. Note the use of permission_mode="plan" —
this makes the agent read-only. It can analyse files but cannot modify them.
# security_reviewer.py
import asyncio
import harness
REVIEW_PROMPT = """Review the file review_target.py for security vulnerabilities.
For each issue found, report:
1. Line number
2. Severity (Critical/High/Medium/Low)
3. Issue description
4. Recommended fix
Format as a markdown table."""
async def main():
findings = []
async for msg in harness.run(
REVIEW_PROMPT,
provider="anthropic",
model="claude-sonnet-4-20250514",
permission_mode="plan", # Read-only — don't modify files
):
if isinstance(msg, harness.TextMessage) and not msg.is_partial:
findings.append(msg.text)
elif isinstance(msg, harness.Result):
print(f"Review complete. Cost: ${msg.total_cost:.4f}")
print("\n".join(findings))
asyncio.run(main())
uv run security_reviewer.py
permission_mode="plan" blocks all write and bash tool calls at the
engine level — the agent physically cannot modify files, run commands, or exfiltrate
data, regardless of what the prompt says.
This makes it safe to run the reviewer on production code. Even a prompt-injected payload hidden in a source file can't cause the agent to execute anything harmful.
Tutorial 3 covers all four permission modes in depth, including custom allow/deny rules and the 4-tier precedence engine.
Provider & Model Selection
Pass provider and model kwargs to harness.run()
to target any supported provider. The same SDK call works across all providers.
# Use different providers
async for msg in harness.run("task", provider="openai", model="gpt-4o"):
...
async for msg in harness.run("task", provider="google", model="gemini-2.5-pro"):
...
async for msg in harness.run("task", provider="ollama", model="llama3.1"):
...
async for msg in harness.run(
"Analyze this code for bugs",
provider="anthropic",
model="claude-sonnet-4-20250514",
):
...
async for msg in harness.run(
"Analyze this code for bugs",
provider="openai",
model="gpt-4o",
):
...
async for msg in harness.run(
"Analyze this code for bugs",
provider="google",
model="gemini-2.5-pro",
):
...
# Runs fully local — no API key, no cost
async for msg in harness.run(
"Analyze this code for bugs",
provider="ollama",
model="llama3.1",
):
...
Capturing Cost Data
Every session ends with a Result message containing complete cost and
usage statistics. Use this to track spend, audit sessions, or enforce budgets.
async for msg in harness.run("Analyze this codebase"):
if isinstance(msg, harness.Result):
print(f"Session: {msg.session_id}")
print(f"Turns: {msg.turns}")
print(f"Tool calls: {msg.tool_calls}")
print(f"Tokens: {msg.total_tokens:,}")
print(f"Cost: ${msg.total_cost:.4f}")
| Result field | Type | Description |
|---|---|---|
session_id | str | Unique identifier for this agent run |
text | str | Final agent response text |
turns | int | Number of agent conversation turns |
tool_calls | int | Total number of tool calls made |
total_tokens | int | Combined input + output token count |
total_cost | float | Estimated cost in USD |
Tutorial 4 (Budget Controls) shows how to set hard spending limits, receive mid-session warnings, and abort runs that exceed your budget threshold.
Next Steps
You can now stream agent events, handle every message type, and build custom Python tools on top of Harness. The next tutorial covers one of the most important production concerns: what the agent is allowed to do.
Learn the 4-mode permission system, custom allow/deny rules, and the 4-tier precedence engine — the most sophisticated permission system available in any AI coding agent.