The Python SDK — Building AI-Powered Tools

Go beyond the CLI. Build custom AI-powered Python tools with full async streaming in under 50 lines of code.

Beginner 15 min read
1

Why Use the SDK?

The CLI is great for interactive use, but real-world applications need programmatic access. The Harness SDK gives you a fully async streaming Python interface — so you can integrate AI agent capabilities directly into scripts, pipelines, and services.

What competitors can't do

Claude Code and Cursor are CLI/IDE only. Aider has no streaming SDK. Harness gives you full async streaming — build custom AI tools in Python, integrate with your pipelines, and react to every event in real time.

With the SDK you can:

  • React to individual text chunks as they stream
  • Intercept and log every tool call the agent makes
  • Capture precise cost and token data per session
  • Switch providers and models programmatically
  • Wire in custom approval logic before any tool executes
2

Your First SDK Script

Create this file and run it — you'll see the agent's response stream token by token, then a cost summary at the end.

python hello_harness.py
# hello_harness.py
import asyncio
import harness

async def main():
    async for message in harness.run("What is 2 + 2?"):
        if isinstance(message, harness.TextMessage) and not message.is_partial:
            print(message.text)
        elif isinstance(message, harness.Result):
            print(f"\nDone! Tokens: {message.total_tokens}, Cost: ${message.total_cost:.4f}")

asyncio.run(main())
bash
uv run hello_harness.py
Expected output

2 + 2 = 4

Done! Tokens: 142, Cost: $0.0002

harness.run() yields every streaming chunk as a TextMessage with is_partial=True. The final, complete message has is_partial=False.

Checking not message.is_partial lets you skip partial chunks and only process complete assistant turns. For a live typing effect, iterate partial chunks instead.

3

Message Types

harness.run() is an async generator that yields a union of message types. Each type represents a different event in the agent's execution lifecycle.

python
Message = TextMessage | ToolUse | ToolResult | Result | CompactionEvent | SystemEvent
TextMessage
Streaming text from the agent. is_partial=True for in-flight chunks, False for the complete turn.
ToolUse
The agent wants to call a tool. Contains name (tool name) and args (dict of arguments).
ToolResult
The result from a tool execution. Contains content and is_error to distinguish failures.
Result
Final message when the agent is done. Contains text, turns, tool_calls, total_tokens, total_cost, session_id.
CompactionEvent
Emitted when the agent compresses its context window to free up token space for long-running sessions.
SystemEvent
Lifecycle events — session start, model switch, provider change, and other internal state transitions.
Type-safe event handling

Every message type is a Python dataclass. Use isinstance() checks or Python 3.10+ match statements for exhaustive, type-safe handling.

4

Pattern Matching

Python 3.10+ structural pattern matching is the cleanest way to handle the message union. Each case extracts fields directly from the matched type.

python
async for msg in harness.run("Refactor utils.py"):
    match msg:
        case harness.TextMessage(text=t, is_partial=False):
            print(f"Agent: {t}")
        case harness.ToolUse(name=name, args=args):
            print(f"Tool call: {name}({args})")
        case harness.ToolResult(content=c, is_error=True):
            print(f"Error: {c}")
        case harness.Result() as r:
            print(f"Done in {r.turns} turns, {r.tool_calls} tool calls")
            print(f"Cost: ${r.total_cost:.4f}")
Try it

Replace "Refactor utils.py" with any task and point it at a real file in your project. Watch the ToolUse events fire as the agent reads and edits files.

5

Build an AI Security Reviewer

Let's build something real: a security code reviewer that scans a file for vulnerabilities and outputs a structured report. First, create the file to review:

python review_target.py
# review_target.py — can you spot the security issues?

import pickle
import subprocess

SECRET_KEY = "hardcoded_password_123"  # Security issue: hardcoded secret

def load_data(filepath):
    with open(filepath, "rb") as f:
        return pickle.load(f)  # Security issue: arbitrary code execution

def process_input(user_input):
    result = eval(user_input)  # Security issue: code injection
    return result

def run_command(cmd):
    return subprocess.run(cmd, shell=True, capture_output=True)  # Security issue: command injection

def get_user(user_id):
    query = f"SELECT * FROM users WHERE id = {user_id}"  # Security issue: SQL injection
    return query
5 security issues in 20 lines

Hardcoded secret, pickle deserialization (arbitrary code execution), eval injection, command injection via shell=True, and SQL injection. The agent will find all five.

Now create the reviewer script. Note the use of permission_mode="plan" — this makes the agent read-only. It can analyse files but cannot modify them.

python security_reviewer.py
# security_reviewer.py
import asyncio
import harness

REVIEW_PROMPT = """Review the file review_target.py for security vulnerabilities.
For each issue found, report:
1. Line number
2. Severity (Critical/High/Medium/Low)
3. Issue description
4. Recommended fix

Format as a markdown table."""

async def main():
    findings = []
    async for msg in harness.run(
        REVIEW_PROMPT,
        provider="anthropic",
        model="claude-sonnet-4-20250514",
        permission_mode="plan",  # Read-only — don't modify files
    ):
        if isinstance(msg, harness.TextMessage) and not msg.is_partial:
            findings.append(msg.text)
        elif isinstance(msg, harness.Result):
            print(f"Review complete. Cost: ${msg.total_cost:.4f}")

    print("\n".join(findings))

asyncio.run(main())
bash
uv run security_reviewer.py

permission_mode="plan" blocks all write and bash tool calls at the engine level — the agent physically cannot modify files, run commands, or exfiltrate data, regardless of what the prompt says.

This makes it safe to run the reviewer on production code. Even a prompt-injected payload hidden in a source file can't cause the agent to execute anything harmful.

Tutorial 3 covers all four permission modes in depth, including custom allow/deny rules and the 4-tier precedence engine.

6

Provider & Model Selection

Pass provider and model kwargs to harness.run() to target any supported provider. The same SDK call works across all providers.

python
# Use different providers
async for msg in harness.run("task", provider="openai", model="gpt-4o"):
    ...

async for msg in harness.run("task", provider="google", model="gemini-2.5-pro"):
    ...

async for msg in harness.run("task", provider="ollama", model="llama3.1"):
    ...
python
async for msg in harness.run(
    "Analyze this code for bugs",
    provider="anthropic",
    model="claude-sonnet-4-20250514",
):
    ...
python
async for msg in harness.run(
    "Analyze this code for bugs",
    provider="openai",
    model="gpt-4o",
):
    ...
python
async for msg in harness.run(
    "Analyze this code for bugs",
    provider="google",
    model="gemini-2.5-pro",
):
    ...
python
# Runs fully local — no API key, no cost
async for msg in harness.run(
    "Analyze this code for bugs",
    provider="ollama",
    model="llama3.1",
):
    ...
7

Capturing Cost Data

Every session ends with a Result message containing complete cost and usage statistics. Use this to track spend, audit sessions, or enforce budgets.

python
async for msg in harness.run("Analyze this codebase"):
    if isinstance(msg, harness.Result):
        print(f"Session: {msg.session_id}")
        print(f"Turns: {msg.turns}")
        print(f"Tool calls: {msg.tool_calls}")
        print(f"Tokens: {msg.total_tokens:,}")
        print(f"Cost: ${msg.total_cost:.4f}")
Result field Type Description
session_idstrUnique identifier for this agent run
textstrFinal agent response text
turnsintNumber of agent conversation turns
tool_callsintTotal number of tool calls made
total_tokensintCombined input + output token count
total_costfloatEstimated cost in USD
Budget enforcement

Tutorial 4 (Budget Controls) shows how to set hard spending limits, receive mid-session warnings, and abort runs that exceed your budget threshold.

8

Next Steps

You can now stream agent events, handle every message type, and build custom Python tools on top of Harness. The next tutorial covers one of the most important production concerns: what the agent is allowed to do.

Tutorial 3: Permission Modes & Safety Controls

Learn the 4-mode permission system, custom allow/deny rules, and the 4-tier precedence engine — the most sophisticated permission system available in any AI coding agent.