Tutorial 10 — Capstone

Enterprise Production Deployment

The definitive guide to deploying Harness with all 14 enterprise features enabled simultaneously. This capstone tutorial brings together multi-provider routing, policy-as-code, audit logging, OpenTelemetry observability, Docker sandboxing, and cloud-native deployment into a single, production-ready configuration.

1The Enterprise Checklist

Before deploying any AI coding agent in a regulated or enterprise environment, security, compliance, and operations teams ask the same 14 questions. Harness is the only coding agent that answers yes to all of them. The table below is the definitive comparison.

14-Feature Enterprise Comparison

Every feature that matters for production-grade coding agent deployment

# Enterprise Feature Harness Claude Code Cursor Aider OpenHands SWE-Agent
1 Multi-provider (5+)
2 Full async streaming SDK
3 4-mode permission system
4 Token / cost budgets
5 Policy-as-code engine
6 SHA-256 audit hash chain
7 PII scanning
8 Dual sandbox modes
9 Sub-agent parallelism
10 Model router + fallback
11 Native CI/CD integration
12 OpenTelemetry observability
13 Hooks system
14 Interactive REPL
Total Score 14 / 14 3 / 14 0 / 14 1 / 14 3 / 14 1 / 14
Harness is the only coding agent that checks every enterprise box

Claude Code scores 3/14 — the second-closest competitor. It has sub-agent parallelism, hooks, and an interactive REPL, but lacks budgets, policy-as-code, audit logging, PII scanning, observability, sandbox modes, and CI/CD integration. This tutorial shows you how to deploy all 14 features together.

2Complete Configuration

The following configuration file enables every enterprise feature. Copy it to .harness/config.toml in your project root (or ~/.harness/config.toml for a user-level default) and fill in your provider API keys via environment variables.

TOML .harness/config.toml
# .harness/config.toml — Complete Enterprise Configuration

# === Provider Selection ===
# Provider and model are set via CLI flags (-p, -m),
# environment variables, or the router configuration below.
# Default provider: anthropic
# Default model: determined by provider

# === Router & Budget ===
[router]
strategy = "cost_optimized"
fallback_chain = ["anthropic", "openai", "google"]
max_cost_per_session = 5.00
max_tokens_per_session = 1000000
simple_task_model = "claude-haiku-4-5-20251001"

# === Permissions ===
[permissions]
mode = "accept_edits"

# === Policy ===
[policy]
policy_paths = [".harness/policy.yml", "~/.harness/policy.yml"]
simulation_mode = false

# === Audit ===
[audit]
enabled = true
scan_pii = true
retention_days = 365
retention_max_size_mb = 1000
log_tool_args = true

# === Sandbox ===
[sandbox]
enabled = true
mode = "docker"
max_memory_mb = 1024
max_cpu_seconds = 60
network_access = false
docker_image = "python:3.12-slim"
allowed_paths = ["/workspace"]
blocked_commands = ["rm -rf /", "curl", "wget", "nc"]

# === OpenTelemetry ===
# OpenTelemetry is configured via environment variables:
# export OTEL_SERVICE_NAME="harness-agent"
# export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"
# export OTEL_TRACES_EXPORTER="otlp"
# export OTEL_METRICS_EXPORTER="otlp"
Configuration Precedence

Harness merges configuration in this order: environment variables override .harness/config.toml (project), which overrides ~/.harness/config.toml (user). API keys should always be set via environment variables, never committed to config files.

3Architecture Overview

The diagram below shows how all 14 enterprise features interact at runtime. Every task flows through the Model Router and Policy Engine before reaching the Agent Loop, then exits through the Sandbox and Audit Logger to OpenTelemetry.

Every Request is Governed End-to-End

No tool call can bypass the Policy Engine or Budget Tracker. Even sub-agents spawned during parallel execution inherit the parent session's policy and budget constraints. The audit hash chain covers every event from session start to completion.

4OpenTelemetry Observability

Harness emits OpenTelemetry traces and metrics for every agent session, tool call, and model request. Start the monitoring stack with a single Docker Compose command, then explore traces in Jaeger and dashboards in Grafana.

Telemetry Configuration

Shell Environment Variables
# OpenTelemetry is configured via environment variables:
export OTEL_SERVICE_NAME="harness-agent"
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"
export OTEL_TRACES_EXPORTER="otlp"
export OTEL_METRICS_EXPORTER="otlp"

Monitoring Stack

YAML docker-compose.monitoring.yml
version: "3.8"
services:
  jaeger:
    image: jaegertracing/all-in-one:1.52
    ports:
      - "16686:16686"  # Jaeger UI
      - "4317:4317"    # OTLP gRPC
    environment:
      - COLLECTOR_OTLP_ENABLED=true

  prometheus:
    image: prom/prometheus:v2.48.0
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  grafana:
    image: grafana/grafana:10.2.0
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    depends_on:
      - prometheus
      - jaeger
Bash
docker compose -f docker-compose.monitoring.yml up -d

Prometheus Scrape Config

YAML prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: "harness"
    static_configs:
      - targets: ["host.docker.internal:9464"]
Access the Stack

After starting the stack, access Jaeger at http://localhost:16686 to see distributed traces, Prometheus at http://localhost:9090 for raw metrics, and Grafana at http://localhost:3000 (admin / admin) for dashboards.

Key metrics emitted by Harness to the configured OTLP endpoint:

Metric Type Description
harness_tokens Counter Total tokens consumed this session
harness_tool_calls Counter Total tool calls made this session
harness_cost Gauge Cumulative session cost in USD
harness_provider_latency Histogram Provider response latency distribution
harness_context_utilization Gauge Fraction of context window consumed
harness_audit_chain_valid Gauge 1 = chain intact, 0 = integrity broken

5Security Hardening

Follow these five steps to harden a Harness deployment before exposing it to production workloads. Each step takes under two minutes.

Set restrictive file permissions

Lock down the config directory so only your user account can read API keys and credentials.

Bash
chmod 600 ~/.harness/config.toml   # Owner read/write only
chmod 600 ~/.harness/credentials   # Protect API keys
chmod 700 ~/.harness/              # Protect directory

Use environment variables for secrets

Never put API keys in config files. Use environment variables that are injected at runtime by your secrets manager or CI/CD system.

Bash
# Use env vars instead of config file for secrets
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
# Never commit API keys to git!

Enable Docker sandbox with network disabled

Docker mode provides filesystem isolation that the process sandbox cannot. Disabling network access prevents exfiltration even if the agent is tricked into running malicious commands.

TOML .harness/config.toml
[sandbox]
enabled = true
mode = "docker"
network_access = false

Enable PII scanning in the audit logger

PII scanning checks every tool argument and result for names, emails, SSNs, credit card numbers, and phone numbers before writing them to the audit log.

TOML .harness/config.toml
[audit]
scan_pii = true

Add a network exfiltration policy rule

Even with network_access = false in Docker, add an explicit policy rule as a defence-in-depth layer for any future sandbox escape scenarios.

YAML .harness/policy.yml
# .harness/policy.yml
version: 1
rules:
  - tool: Bash
    decision: deny
    conditions:
      - type: command_matches
        pattern: "curl.*|wget.*|nc .*"
    description: "Block network exfiltration"
Docker vs Process Sandbox

In production, always use Docker sandbox mode with network disabled. The process sandbox (mode = "process") is lighter-weight and faster to start, but it does not provide filesystem isolation — the agent can still read files outside your working directory if not restricted by policy rules.

6Multi-Team Policy Setup

Large organizations need three levels of policy: organization-wide rules that apply everywhere, team-level customizations, and project-specific overrides. Harness evaluates these in reverse order (project first) so more specific rules always take precedence.

Organization Level
~/.harness/policy.yml
Global rules for every project on this machine / CI runner
Team Level
~/team-policy.yml  |  shared repo .harness/policy.yml
Shared across all projects in a team repository; inherits org rules
Project Level
.harness/policy.yml  —  in the project root
Project-specific rules; inherits team rules via inherit_from

Organization Policy

YAML ~/.harness/policy.yml
version: 1
defaults:
  decision: ask
rules:
  - tool: Bash
    decision: deny
    conditions:
      - type: command_matches
        pattern: "rm -rf /.*"
    description: "Org: Never delete root paths"
  - tool: Write
    decision: deny
    conditions:
      - type: path_matches
        pattern: "\\.(env|pem|key)$"
    description: "Org: Protect secrets"

Team Policy

YAML team-policy.yml
version: 1
inherit_from: "~/.harness/policy.yml"
rules:
  - tool: Bash
    decision: deny
    conditions:
      - type: command_matches
        pattern: "docker push.*"
    description: "Team: No manual docker pushes"

Project Policy

YAML .harness/policy.yml
version: 1
inherit_from: "../team-policy.yml"
rules:
  - tool: Read
    decision: allow
    description: "Project: Allow all reads"
  - tool: Edit
    decision: allow
    conditions:
      - type: path_matches
        pattern: "src/.*\\.py$"
    description: "Project: Allow editing Python source"
Try It: Test your policy chain

Use simulation mode to verify the full inheritance chain without enforcing anything. Set simulation_mode = true in your config and run a test task. Then check the audit log to confirm every tool call was evaluated against all three policy files.

7Monitoring & Alerting

Add these Prometheus alert rules to get notified about cost overruns, exhausted budgets, audit chain integrity failures, and high error rates. Save them to alerts.yml and reference them from your Prometheus configuration.

YAML alerts.yml
groups:
  - name: harness
    rules:
      - alert: HighCostSession
        expr: harness_cost > 5
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Harness session cost exceeds $5"

      - alert: HighErrorRate
        expr: rate(harness_tool_calls{status="error"}[5m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Harness tool error rate above 10%"

      - alert: HighLatency
        expr: harness_provider_latency > 30000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Harness provider latency above 30s"

      - alert: AuditChainBroken
        expr: harness_audit_chain_valid == 0
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: "Audit log chain integrity check failed"
Treat AuditChainBroken as P0

A broken audit hash chain means one or more audit log entries have been tampered with or corrupted. This alert should page your on-call team immediately. Preserve the log file and open an incident before running any further agent tasks.

Create a Grafana dashboard with these panels for a complete operational view:

  • Session Cost (USD)harness_cost as a time-series
  • Token Usageharness_tokens as a counter time-series
  • Context Utilizationharness_context_utilization as a gauge with warning threshold at 0.8
  • Tool Calls/minrate(harness_tool_calls[1m])
  • Error Raterate(harness_tool_calls{status="error"}[5m]) / rate(harness_tool_calls[5m])
  • Provider Latency P99histogram_quantile(0.99, harness_provider_latency_bucket)
  • Audit Chain Statusharness_audit_chain_valid as a stat panel (green=1, red=0)

8Production Script

Use this script as the entry point for any production automation. It handles budget guarding, structured logging, and clean error exit codes — ready to use directly in a Kubernetes Job, GitHub Actions step, or cron task.

Python production_agent.py
# production_agent.py — Enterprise-grade Harness deployment
import asyncio
import logging
import sys
from pathlib import Path

import harness
from harness.audit.logger import AuditLogger
from harness.providers.budget import TokenBudgetTracker, BudgetExhaustedError

logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
logger = logging.getLogger("harness-prod")

async def run_production_task(prompt: str) -> dict:
    """Run a task with full enterprise controls."""

    # Budget guard
    # Note: This budget tracker monitors cost AFTER each run completes.
    # To enforce budgets mid-run, configure the [router] section in
    # .harness/config.toml with max_cost_per_session — the engine
    # checks the budget internally between turns.
    budget = TokenBudgetTracker(max_tokens=500_000, max_cost=5.00)

    result_data = {
        "success": False,
        "output": "",
        "tokens": 0,
        "cost": 0.0,
        "error": None,
    }

    try:
        async for msg in harness.run(
            prompt,
            provider="anthropic",
            model="claude-sonnet-4-20250514",
            permission_mode="accept_edits",
            sandbox_mode="docker",
            max_turns=50,
        ):
            match msg:
                case harness.TextMessage(text=t, is_partial=False):
                    logger.info(f"Agent: {t[:100]}...")

                case harness.ToolUse(name=name, args=args):
                    logger.info(f"Tool: {name}")

                case harness.Result() as r:
                    # Record final usage
                    budget.record_usage(
                        # Approximation: total_tokens doesn't split input/output
                        input_tokens=r.total_tokens // 2,
                        output_tokens=r.total_tokens // 2,
                        cost=r.total_cost,
                    )
                    # Check if we should stop future runs
                    budget.check_budget()

                    result_data.update({
                        "success": True,
                        "output": r.text,
                        "tokens": r.total_tokens,
                        "cost": r.total_cost,
                    })

                    snap = budget.snapshot()
                    logger.info(
                        f"Complete: {r.turns} turns, {r.tool_calls} tools, "
                        f"${r.total_cost:.4f}, {r.total_tokens:,} tokens"
                    )
                    logger.info(
                        f"Budget: ${snap.cost_remaining:.2f} remaining, "
                        f"{snap.tokens_remaining:,} tokens remaining"
                    )

    except BudgetExhaustedError as e:
        logger.error(f"Budget exhausted: {e}")
        result_data["error"] = str(e)
    except Exception as e:
        logger.error(f"Unexpected error: {e}")
        result_data["error"] = str(e)

    return result_data

async def main():
    if len(sys.argv) < 2:
        print("Usage: python production_agent.py 'Your task here'")
        sys.exit(1)

    prompt = sys.argv[1]
    logger.info(f"Starting production task: {prompt[:50]}...")

    result = await run_production_task(prompt)

    if result["success"]:
        logger.info(f"Task completed successfully. Cost: ${result['cost']:.4f}")
    else:
        logger.error(f"Task failed: {result['error']}")
        sys.exit(1)

if __name__ == "__main__":
    asyncio.run(main())
Bash
uv run python production_agent.py "Audit all TODO comments and open GitHub issues for each"

9Capacity Planning

Use the table below to estimate monthly costs before enabling Harness for your team. The cost-optimized router automatically routes simple tasks to Haiku and complex tasks to Sonnet, so your actual costs will be at or below these estimates.

Use Case Tasks / Day Avg Tokens / Task Model Daily Cost Monthly Cost
PR Reviews 10 5,000 Sonnet 4 $0.90 $27
Issue Triage 20 2,000 Haiku 4.5 $0.12 $3.60
Code Generation 5 50,000 Sonnet 4 $4.50 $135
Security Scans 3 30,000 Opus 4 $13.50 $405
Total 38 $19.02 $570.60
Cost-Optimized Routing Saves 40–60%

With strategy = "cost_optimized", simple tasks use Haiku ($0.80 / 1M input tokens) and complex tasks use Sonnet ($3 / 1M input tokens). This typically reduces total costs by 40–60% compared to always using the best available model.

Session Budget Recommendation

TOML .harness/config.toml
[router]
max_cost_per_session = 10.00       # Individual session cap
# Monthly budget tracked externally via audit logs + Prometheus

The audit log is JSONL format. Aggregate monthly costs with a single Python one-liner:

Bash
# Sum total cost from audit log entries this month
python3 -c "
import json, sys
from pathlib import Path
from datetime import datetime
month = datetime.now().strftime('%Y-%m')
total = sum(
    e.get('cost', 0) for e in
    (json.loads(l) for l in open(Path('~/.harness/audit.jsonl').expanduser()))
    if e.get('timestamp', '').startswith(month)
)
print(f'Monthly cost: \${total:.4f}')
"

10Cloud Deployment

Deploy Harness as a containerized workload on your cloud provider of choice. Select your platform below for the Kubernetes deployment manifest and CLI commands.

Deploy to Amazon Elastic Kubernetes Service. API keys are stored in Secrets Manager and injected via a Kubernetes Secret.

YAML k8s/deployment.yml
# AWS EKS deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: harness-agent
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: harness
          image: your-ecr-repo/harness-agent:latest
          env:
            - name: ANTHROPIC_API_KEY
              valueFrom:
                secretKeyRef:
                  name: harness-secrets
                  key: anthropic-api-key
          resources:
            limits:
              memory: "2Gi"
              cpu: "1"
Bash
# ECR + EKS setup
aws ecr create-repository --repository-name harness-agent
docker build -t harness-agent .
docker tag harness-agent:latest <account>.dkr.ecr.<region>.amazonaws.com/harness-agent:latest
docker push <account>.dkr.ecr.<region>.amazonaws.com/harness-agent:latest
kubectl apply -f k8s/deployment.yml
Kubernetes Resource Limits

Always set both requests and limits for memory and CPU. An unconstrained Harness container running a large code generation task with sub-agents can consume several gigabytes of RAM during peak parallel execution.

11Congratulations

You've completed the entire tutorial series

You now know how to deploy Harness with all 14 enterprise features in production

Over 10 tutorials you went from installing Harness to running a fully governed, observable, cost-controlled enterprise AI coding agent. Here is a summary of everything covered:

Bash
uv tool install harness-agent
harness --version