Enterprise Production Deployment
The definitive guide to deploying Harness with all 14 enterprise features enabled simultaneously. This capstone tutorial brings together multi-provider routing, policy-as-code, audit logging, OpenTelemetry observability, Docker sandboxing, and cloud-native deployment into a single, production-ready configuration.
1The Enterprise Checklist
Before deploying any AI coding agent in a regulated or enterprise environment, security, compliance, and operations teams ask the same 14 questions. Harness is the only coding agent that answers yes to all of them. The table below is the definitive comparison.
14-Feature Enterprise Comparison
Every feature that matters for production-grade coding agent deployment
| # | Enterprise Feature | Harness | Claude Code | Cursor | Aider | OpenHands | SWE-Agent |
|---|---|---|---|---|---|---|---|
| 1 | Multi-provider (5+) | ✓ | ✗ | ✗ | ✓ | ✓ | ✗ |
| 2 | Full async streaming SDK | ✓ | ✗ | ✗ | ✗ | ✓ | ✗ |
| 3 | 4-mode permission system | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| 4 | Token / cost budgets | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| 5 | Policy-as-code engine | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| 6 | SHA-256 audit hash chain | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| 7 | PII scanning | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| 8 | Dual sandbox modes | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ |
| 9 | Sub-agent parallelism | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ |
| 10 | Model router + fallback | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| 11 | Native CI/CD integration | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| 12 | OpenTelemetry observability | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| 13 | Hooks system | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ |
| 14 | Interactive REPL | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ |
| Total Score | 14 / 14 | 3 / 14 | 0 / 14 | 1 / 14 | 3 / 14 | 1 / 14 |
Claude Code scores 3/14 — the second-closest competitor. It has sub-agent parallelism, hooks, and an interactive REPL, but lacks budgets, policy-as-code, audit logging, PII scanning, observability, sandbox modes, and CI/CD integration. This tutorial shows you how to deploy all 14 features together.
2Complete Configuration
The following configuration file enables every enterprise feature. Copy it to
.harness/config.toml in your project root (or ~/.harness/config.toml
for a user-level default) and fill in your provider API keys via environment variables.
# .harness/config.toml — Complete Enterprise Configuration
# === Provider Selection ===
# Provider and model are set via CLI flags (-p, -m),
# environment variables, or the router configuration below.
# Default provider: anthropic
# Default model: determined by provider
# === Router & Budget ===
[router]
strategy = "cost_optimized"
fallback_chain = ["anthropic", "openai", "google"]
max_cost_per_session = 5.00
max_tokens_per_session = 1000000
simple_task_model = "claude-haiku-4-5-20251001"
# === Permissions ===
[permissions]
mode = "accept_edits"
# === Policy ===
[policy]
policy_paths = [".harness/policy.yml", "~/.harness/policy.yml"]
simulation_mode = false
# === Audit ===
[audit]
enabled = true
scan_pii = true
retention_days = 365
retention_max_size_mb = 1000
log_tool_args = true
# === Sandbox ===
[sandbox]
enabled = true
mode = "docker"
max_memory_mb = 1024
max_cpu_seconds = 60
network_access = false
docker_image = "python:3.12-slim"
allowed_paths = ["/workspace"]
blocked_commands = ["rm -rf /", "curl", "wget", "nc"]
# === OpenTelemetry ===
# OpenTelemetry is configured via environment variables:
# export OTEL_SERVICE_NAME="harness-agent"
# export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"
# export OTEL_TRACES_EXPORTER="otlp"
# export OTEL_METRICS_EXPORTER="otlp"
Harness merges configuration in this order: environment variables override
.harness/config.toml (project), which overrides
~/.harness/config.toml (user). API keys should always be set via environment
variables, never committed to config files.
3Architecture Overview
The diagram below shows how all 14 enterprise features interact at runtime. Every task flows through the Model Router and Policy Engine before reaching the Agent Loop, then exits through the Sandbox and Audit Logger to OpenTelemetry.
No tool call can bypass the Policy Engine or Budget Tracker. Even sub-agents spawned during parallel execution inherit the parent session's policy and budget constraints. The audit hash chain covers every event from session start to completion.
4OpenTelemetry Observability
Harness emits OpenTelemetry traces and metrics for every agent session, tool call, and model request. Start the monitoring stack with a single Docker Compose command, then explore traces in Jaeger and dashboards in Grafana.
Telemetry Configuration
# OpenTelemetry is configured via environment variables:
export OTEL_SERVICE_NAME="harness-agent"
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"
export OTEL_TRACES_EXPORTER="otlp"
export OTEL_METRICS_EXPORTER="otlp"
Monitoring Stack
version: "3.8"
services:
jaeger:
image: jaegertracing/all-in-one:1.52
ports:
- "16686:16686" # Jaeger UI
- "4317:4317" # OTLP gRPC
environment:
- COLLECTOR_OTLP_ENABLED=true
prometheus:
image: prom/prometheus:v2.48.0
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
grafana:
image: grafana/grafana:10.2.0
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
depends_on:
- prometheus
- jaeger
docker compose -f docker-compose.monitoring.yml up -d
Prometheus Scrape Config
global:
scrape_interval: 15s
scrape_configs:
- job_name: "harness"
static_configs:
- targets: ["host.docker.internal:9464"]
After starting the stack, access Jaeger at
http://localhost:16686 to see distributed traces, Prometheus
at http://localhost:9090 for raw metrics, and Grafana at
http://localhost:3000 (admin / admin) for dashboards.
Key metrics emitted by Harness to the configured OTLP endpoint:
| Metric | Type | Description |
|---|---|---|
| harness_tokens | Counter | Total tokens consumed this session |
| harness_tool_calls | Counter | Total tool calls made this session |
| harness_cost | Gauge | Cumulative session cost in USD |
| harness_provider_latency | Histogram | Provider response latency distribution |
| harness_context_utilization | Gauge | Fraction of context window consumed |
| harness_audit_chain_valid | Gauge | 1 = chain intact, 0 = integrity broken |
5Security Hardening
Follow these five steps to harden a Harness deployment before exposing it to production workloads. Each step takes under two minutes.
Set restrictive file permissions
Lock down the config directory so only your user account can read API keys and credentials.
chmod 600 ~/.harness/config.toml # Owner read/write only
chmod 600 ~/.harness/credentials # Protect API keys
chmod 700 ~/.harness/ # Protect directory
Use environment variables for secrets
Never put API keys in config files. Use environment variables that are injected at runtime by your secrets manager or CI/CD system.
# Use env vars instead of config file for secrets
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
# Never commit API keys to git!
Enable Docker sandbox with network disabled
Docker mode provides filesystem isolation that the process sandbox cannot. Disabling network access prevents exfiltration even if the agent is tricked into running malicious commands.
[sandbox]
enabled = true
mode = "docker"
network_access = false
Enable PII scanning in the audit logger
PII scanning checks every tool argument and result for names, emails, SSNs, credit card numbers, and phone numbers before writing them to the audit log.
[audit]
scan_pii = true
Add a network exfiltration policy rule
Even with network_access = false in Docker, add an explicit policy rule as a defence-in-depth layer for any future sandbox escape scenarios.
# .harness/policy.yml
version: 1
rules:
- tool: Bash
decision: deny
conditions:
- type: command_matches
pattern: "curl.*|wget.*|nc .*"
description: "Block network exfiltration"
In production, always use Docker sandbox mode with network disabled. The process sandbox
(mode = "process") is lighter-weight and faster to start, but it does not
provide filesystem isolation — the agent can still read files outside your working
directory if not restricted by policy rules.
6Multi-Team Policy Setup
Large organizations need three levels of policy: organization-wide rules that apply everywhere, team-level customizations, and project-specific overrides. Harness evaluates these in reverse order (project first) so more specific rules always take precedence.
Organization Policy
version: 1
defaults:
decision: ask
rules:
- tool: Bash
decision: deny
conditions:
- type: command_matches
pattern: "rm -rf /.*"
description: "Org: Never delete root paths"
- tool: Write
decision: deny
conditions:
- type: path_matches
pattern: "\\.(env|pem|key)$"
description: "Org: Protect secrets"
Team Policy
version: 1
inherit_from: "~/.harness/policy.yml"
rules:
- tool: Bash
decision: deny
conditions:
- type: command_matches
pattern: "docker push.*"
description: "Team: No manual docker pushes"
Project Policy
version: 1
inherit_from: "../team-policy.yml"
rules:
- tool: Read
decision: allow
description: "Project: Allow all reads"
- tool: Edit
decision: allow
conditions:
- type: path_matches
pattern: "src/.*\\.py$"
description: "Project: Allow editing Python source"
Use simulation mode to verify the full inheritance chain without enforcing anything.
Set simulation_mode = true in your config and run a test task. Then check
the audit log to confirm every tool call was evaluated against all three policy files.
7Monitoring & Alerting
Add these Prometheus alert rules to get notified about cost overruns, exhausted budgets,
audit chain integrity failures, and high error rates. Save them to alerts.yml
and reference them from your Prometheus configuration.
groups:
- name: harness
rules:
- alert: HighCostSession
expr: harness_cost > 5
for: 1m
labels:
severity: warning
annotations:
summary: "Harness session cost exceeds $5"
- alert: HighErrorRate
expr: rate(harness_tool_calls{status="error"}[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "Harness tool error rate above 10%"
- alert: HighLatency
expr: harness_provider_latency > 30000
for: 5m
labels:
severity: warning
annotations:
summary: "Harness provider latency above 30s"
- alert: AuditChainBroken
expr: harness_audit_chain_valid == 0
for: 0m
labels:
severity: critical
annotations:
summary: "Audit log chain integrity check failed"
A broken audit hash chain means one or more audit log entries have been tampered with or corrupted. This alert should page your on-call team immediately. Preserve the log file and open an incident before running any further agent tasks.
Create a Grafana dashboard with these panels for a complete operational view:
- Session Cost (USD) —
harness_costas a time-series - Token Usage —
harness_tokensas a counter time-series - Context Utilization —
harness_context_utilizationas a gauge with warning threshold at 0.8 - Tool Calls/min —
rate(harness_tool_calls[1m]) - Error Rate —
rate(harness_tool_calls{status="error"}[5m]) / rate(harness_tool_calls[5m]) - Provider Latency P99 —
histogram_quantile(0.99, harness_provider_latency_bucket) - Audit Chain Status —
harness_audit_chain_validas a stat panel (green=1, red=0)
8Production Script
Use this script as the entry point for any production automation. It handles budget guarding, structured logging, and clean error exit codes — ready to use directly in a Kubernetes Job, GitHub Actions step, or cron task.
# production_agent.py — Enterprise-grade Harness deployment
import asyncio
import logging
import sys
from pathlib import Path
import harness
from harness.audit.logger import AuditLogger
from harness.providers.budget import TokenBudgetTracker, BudgetExhaustedError
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
logger = logging.getLogger("harness-prod")
async def run_production_task(prompt: str) -> dict:
"""Run a task with full enterprise controls."""
# Budget guard
# Note: This budget tracker monitors cost AFTER each run completes.
# To enforce budgets mid-run, configure the [router] section in
# .harness/config.toml with max_cost_per_session — the engine
# checks the budget internally between turns.
budget = TokenBudgetTracker(max_tokens=500_000, max_cost=5.00)
result_data = {
"success": False,
"output": "",
"tokens": 0,
"cost": 0.0,
"error": None,
}
try:
async for msg in harness.run(
prompt,
provider="anthropic",
model="claude-sonnet-4-20250514",
permission_mode="accept_edits",
sandbox_mode="docker",
max_turns=50,
):
match msg:
case harness.TextMessage(text=t, is_partial=False):
logger.info(f"Agent: {t[:100]}...")
case harness.ToolUse(name=name, args=args):
logger.info(f"Tool: {name}")
case harness.Result() as r:
# Record final usage
budget.record_usage(
# Approximation: total_tokens doesn't split input/output
input_tokens=r.total_tokens // 2,
output_tokens=r.total_tokens // 2,
cost=r.total_cost,
)
# Check if we should stop future runs
budget.check_budget()
result_data.update({
"success": True,
"output": r.text,
"tokens": r.total_tokens,
"cost": r.total_cost,
})
snap = budget.snapshot()
logger.info(
f"Complete: {r.turns} turns, {r.tool_calls} tools, "
f"${r.total_cost:.4f}, {r.total_tokens:,} tokens"
)
logger.info(
f"Budget: ${snap.cost_remaining:.2f} remaining, "
f"{snap.tokens_remaining:,} tokens remaining"
)
except BudgetExhaustedError as e:
logger.error(f"Budget exhausted: {e}")
result_data["error"] = str(e)
except Exception as e:
logger.error(f"Unexpected error: {e}")
result_data["error"] = str(e)
return result_data
async def main():
if len(sys.argv) < 2:
print("Usage: python production_agent.py 'Your task here'")
sys.exit(1)
prompt = sys.argv[1]
logger.info(f"Starting production task: {prompt[:50]}...")
result = await run_production_task(prompt)
if result["success"]:
logger.info(f"Task completed successfully. Cost: ${result['cost']:.4f}")
else:
logger.error(f"Task failed: {result['error']}")
sys.exit(1)
if __name__ == "__main__":
asyncio.run(main())
uv run python production_agent.py "Audit all TODO comments and open GitHub issues for each"
9Capacity Planning
Use the table below to estimate monthly costs before enabling Harness for your team. The cost-optimized router automatically routes simple tasks to Haiku and complex tasks to Sonnet, so your actual costs will be at or below these estimates.
| Use Case | Tasks / Day | Avg Tokens / Task | Model | Daily Cost | Monthly Cost |
|---|---|---|---|---|---|
| PR Reviews | 10 | 5,000 | Sonnet 4 | $0.90 | $27 |
| Issue Triage | 20 | 2,000 | Haiku 4.5 | $0.12 | $3.60 |
| Code Generation | 5 | 50,000 | Sonnet 4 | $4.50 | $135 |
| Security Scans | 3 | 30,000 | Opus 4 | $13.50 | $405 |
| Total | 38 | — | — | $19.02 | $570.60 |
With strategy = "cost_optimized", simple tasks use Haiku
($0.80 / 1M input tokens) and complex tasks use Sonnet ($3 / 1M input tokens).
This typically reduces total costs by 40–60% compared to always using the
best available model.
Session Budget Recommendation
[router]
max_cost_per_session = 10.00 # Individual session cap
# Monthly budget tracked externally via audit logs + Prometheus
The audit log is JSONL format. Aggregate monthly costs with a single Python one-liner:
# Sum total cost from audit log entries this month
python3 -c "
import json, sys
from pathlib import Path
from datetime import datetime
month = datetime.now().strftime('%Y-%m')
total = sum(
e.get('cost', 0) for e in
(json.loads(l) for l in open(Path('~/.harness/audit.jsonl').expanduser()))
if e.get('timestamp', '').startswith(month)
)
print(f'Monthly cost: \${total:.4f}')
"
10Cloud Deployment
Deploy Harness as a containerized workload on your cloud provider of choice. Select your platform below for the Kubernetes deployment manifest and CLI commands.
Deploy to Amazon Elastic Kubernetes Service. API keys are stored in Secrets Manager and injected via a Kubernetes Secret.
# AWS EKS deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: harness-agent
spec:
replicas: 1
template:
spec:
containers:
- name: harness
image: your-ecr-repo/harness-agent:latest
env:
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
name: harness-secrets
key: anthropic-api-key
resources:
limits:
memory: "2Gi"
cpu: "1"
# ECR + EKS setup
aws ecr create-repository --repository-name harness-agent
docker build -t harness-agent .
docker tag harness-agent:latest <account>.dkr.ecr.<region>.amazonaws.com/harness-agent:latest
docker push <account>.dkr.ecr.<region>.amazonaws.com/harness-agent:latest
kubectl apply -f k8s/deployment.yml
Always set both requests and limits for memory and CPU.
An unconstrained Harness container running a large code generation task with sub-agents
can consume several gigabytes of RAM during peak parallel execution.
11Congratulations
You've completed the entire tutorial series
You now know how to deploy Harness with all 14 enterprise features in production
Over 10 tutorials you went from installing Harness to running a fully governed, observable, cost-controlled enterprise AI coding agent. Here is a summary of everything covered:
uv tool install harness-agent
harness --version