Definition: Engineering practice where every time an agent makes a mistake, a solution is designed to ensure the agent never makes that mistake again. Concept popularized by Mitchell Hashimoto in Ghostty development.

— Source: NERVICO, Product Development Consultancy

Harness Engineering

Definition

Harness Engineering is the engineering practice where every time an AI agent makes a mistake, the engineer takes the necessary time to design a solution that ensures the agent never makes that specific mistake again. Instead of simply correcting the error manually, infrastructure (guardrails, tests, validations, constraints) is built to prevent its recurrence. The concept was popularized by Mitchell Hashimoto (co-founder of HashiCorp) during the development of Ghostty, his terminal emulator project, where he documented his experience working intensively with AI agents. Harness metaphor: Like a climber’s safety harness, the engineering “harness” protects the agent from falls, allowing it to work at greater heights (complexity) without risk. Core philosophy: Don’t fix the error, fix the system that allowed the error.

Why It Matters

Exponential continuous improvement: Each error corrected through harness engineering permanently increases the agent’s capability. In 3 months of working with agents, Hashimoto built a harness so robust that agents could handle tasks that initially required continuous supervision. Agent scalability: Without harness engineering, each new agent makes the same mistakes. With harness engineering, each agent inherits accumulated protections, dramatically reducing training time and error rate. Compound ROI: Initial investment in harness (2-4 hours per error) pays dividends every time the agent handles similar tasks. Instead of supervising 100 future tasks, you invest once and automate. Mindset shift: Harness engineering changes the engineer’s role from “code writer” to “system designer”. You don’t write code, you design constraints within which the agent can work safely.

Real Examples

Ghostty Development (Mitchell Hashimoto)

Context: Mitchell Hashimoto built Ghostty, a modern terminal emulator, using AI agents intensively with harness engineering. Typical errors and harnesses created: Error 1: Agent modifies critical files without tests

Harness: Pre-commit hook that blocks commits without minimum 80% test coverage
Result: Agent forced to write tests before changes Error 2: Agent introduces breaking changes in public API
Harness: API contract tests with automatic semantic versioning
Result: CI/CD fails if API changes without version bump Error 3: Agent generates code that doesn’t compile
Harness: GitHub Actions runs build on every push
Result: Agent receives immediate feedback and auto-corrects Progress over 3 months:
Month 1: Continuous supervision, 40% of commits require correction
Month 2: Reduced supervision, 15% of commits require correction
Month 3: Occasional supervision, 3% of commits require correction

E-commerce API Development

Context: E-commerce startup using Devin to build REST APIs. Recurring error: Agent didn’t validate input correctly Implemented harness:

// Mandatory Zod schema for all endpoints
const productSchema = z.object({
  name: z.string().min(3).max(100),
  price: z.number().positive(),
  stock: z.number().int().nonnegative(),
});
// Middleware that rejects requests without validation
app.use((req, res, next) => {
  if (!req.validationSchema) {
    throw new Error('Endpoint must define validation schema');
  }
  next();
});

Result: After implementing harness, 0 input validation vulnerabilities in 6 months vs 12-15/month previously.

Fintech Payment Processing

Context: Agent implementing payment logic with Stripe. Critical error: Agent processed refunds without verifying payment state Implemented harness:

Explicit state machine for payment lifecycle
Property-based testing (QuickCheck-style)
Mandatory sandbox for Stripe API calls in development
Human approval required for any code touching production Stripe keys Result: Zero payment bugs in production in 8 months post-harness vs 3 incidents in 2 months pre-harness.

How to Implement Harness Engineering

1. Identify Error Pattern

When agent makes error, ask yourself:

Is this error repeatable?
Could it occur in similar contexts?
What type of error is it? (logic, security, performance, tests)

2. Design the Harness

Harness options by error type: Security errors:

Linters with custom rules (ESLint, Semgrep)
SAST tools in CI/CD (SonarQube, Snyk)
Secret scanning (GitGuardian) Logic errors:
Property-based testing
Contract testing between services
Mutation testing to validate test quality Performance errors:
Performance budgets in CI/CD
Lighthouse CI with thresholds
Automatic load testing API design errors:
OpenAPI schema validation
Breaking change detection
Enforced API versioning

3. Automate the Harness

Harness must execute automatically:

Pre-commit hooks (local code)
CI/CD pipelines (remote code)
Deployment gates (production)

4. Iterate and Refine

Monitor harness effectiveness:

Is agent still making similar errors?
Does harness generate false positives?
Does it need to be stricter or more flexible?

Relationship with Slam Dunk Tasks

Harness engineering enables Slam Dunk Tasks: once you’ve built sufficient harnesses around a type of task, you can delegate that task completely to the agent with confidence it won’t fail. Typical progression:

New task → 100% supervision
Harness built → 50% supervision
Harness refined → 10% supervision
Task becomes Slam Dunk → 0% supervision

Tools and Technologies

Linters and Validators:

ESLint / Prettier (JavaScript/TypeScript)
Ruff / Black (Python)
Clippy (Rust)
Custom rules via AST parsing Testing Frameworks:
Jest / Vitest (unit tests)
Playwright (E2E tests)
Hypothesis / QuickCheck (property-based) CI/CD Harnesses:
GitHub Actions with custom actions
Pre-commit framework
Husky (git hooks)
Danger (PR automation) Security Harnesses:
Semgrep (SAST)
Snyk / Dependabot (dependencies)
GitGuardian (secrets)
OWASP ZAP (DAST)

Slam Dunk Tasks - Tasks that agents can execute with high confidence
Agentic Coding - Development where agents execute code autonomously
Agent-Ops - Role that designs and maintains harnesses for agents
Auto-Healing - Systems that self-repair when detecting problems

Challenges and Considerations

Over-engineering the harness: Not all errors need harness. If an error occurs once and is trivial to fix, don’t build complex infrastructure. Harness maintenance: Harnesses require maintenance. When codebase evolves, harnesses must evolve too. Budget 10-15% of engineering time to harness maintenance. Balance with speed: Strict harnesses can slow initial development. For MVPs, consider minimal harnesses (security + critical bugs) and expand later. False sense of security: Harness engineering reduces errors but doesn’t eliminate them. Maintain code reviews for critical architectural decisions.

Additional Resources

Last updated: February 2026 Category: AI Development Popularized by: Mitchell Hashimoto (HashiCorp, Ghostty) Related to: Slam Dunk Tasks, Agentic Coding, Agent-Ops, Continuous Improvement Keywords: harness engineering, mitchell hashimoto, agent improvement, agentic coding, ghostty, ai agent frameworks, agent guardrails

Harness Engineering

Harness Engineering

Definition

Why It Matters

Real Examples

Ghostty Development (Mitchell Hashimoto)

E-commerce API Development

Fintech Payment Processing

How to Implement Harness Engineering

1. Identify Error Pattern

2. Design the Harness

3. Automate the Harness

4. Iterate and Refine

Relationship with Slam Dunk Tasks

Tools and Technologies

Related Terms

Challenges and Considerations

Additional Resources

Need help with product development?