Definition: Tasks where an AI agent can execute with high confidence and minimal supervision because they have been validated multiple times. Concept from Mitchell Hashimoto framework for effective agent delegation.

— Source: NERVICO, Product Development Consultancy

Slam Dunk Tasks

Definition

Slam Dunk Tasks are tasks where an AI agent can execute with high confidence and minimal supervision because they have been validated multiple times and the agent has consistently demonstrated its ability to complete them correctly. The term comes from basketball (sure dunks) and was popularized by Mitchell Hashimoto in his framework for working with AI agents. The philosophy is simple: once you know an agent can handle a task reliably, delegate it completely while you work on something more interesting or complex. Criteria for Slam Dunk:

Agent has successfully completed the task 5+ times
Harnesses exist that prevent common errors
Task has clear, measurable success criteria
Doesn’t require complex architectural judgment
Failures are automatically detectable Core principle: “Outsource the Slam Dunks” - delegate what you know works.

Why It Matters

Time multiplier: By delegating Slam Dunks to agents, engineers can focus on high-value work (architecture, strategic decisions, complex features) while agents handle repetitive or well-defined tasks. Reduced context switching: Instead of alternating between complex and simple tasks, you stay in flow state on difficult problems while agents execute simple tasks in parallel. Personal scalability: Mitchell Hashimoto reports 3-5× increase in personal output after identifying and delegating his Slam Dunks. Not because the agent is faster, but because he can work continuously on high-leverage tasks while agents handle the rest. Path to full autonomy: Slam Dunks are the first step toward multi-agent orchestration. If you can identify 5-10 types of Slam Dunk tasks, you can build a team of specialized agents that operate with minimal supervision.

Examples of Slam Dunk Tasks

Development

Writing unit tests:

After 3-5 features, agent learns testing pattern
Harness: Minimum 80% test coverage in CI/CD
Supervision: Code review only if coverage drops Implementing CRUD endpoints:
Agent knows standard REST pattern
Harness: OpenAPI schema validation + integration tests
Supervision: Spot check every 5-10 endpoints Database migrations:
Agent generates migrations following naming conventions
Harness: Dry-run in staging + rollback tests
Supervision: Manual review only for complex schema changes Refactoring for consistency:
Agent applies established patterns in codebase
Harness: Linters + existing test suite
Supervision: Diff review at end of batch

DevOps

Infrastructure as Code updates:

Agent modifies Terraform following existing modules
Harness: terraform plan + cost estimation
Supervision: Human approval only if cost increase >10% CI/CD pipeline maintenance:
Agent updates GitHub Actions when dependencies change
Harness: Test pipeline in branch before merge
Supervision: Monitor first run in main Log analysis and alerting:
Agent identifies patterns in logs and proposes alerts
Harness: Alert testing in staging
Supervision: Review alert messages for clarity

Documentation

API documentation:

Agent generates OpenAPI docs from code
Harness: Schema validation + example testing
Supervision: Spot check clarity for external users Code comments:
Agent adds JSDoc/docstrings following conventions
Harness: Linter checks format
Supervision: None (low risk) README updates:
Agent keeps README synchronized with changes
Harness: Markdown linting
Supervision: Quick read before release

How to Identify Your Slam Dunks

Evaluation Framework

For each type of task, ask yourself: 1. Repeatability

Does task follow a consistent pattern?
Are there clear examples in the codebase?
Are success criteria objective? 2. Risk Level
What happens if agent makes a mistake?
Are errors automatically detectable?
How much does it cost to fix an error? 3. Track Record
Has agent completed this task before?
How many times? With what success rate?
Do previous errors have harnesses? 4. Complexity
Does it require architectural decisions?
Does it involve complex trade-offs?
Does it need deep domain expertise?

Scoring System

Criterion	Weight	Score
Repeatability	30%	0-10
Low Risk	30%	0-10
Track Record	25%	0-10
Low Complexity	15%	0-10
Total Score		0-10

Slam Dunk threshold: Score ≥7.5/10

Progression: From Supervision to Slam Dunk

Stage 1: New Task (100% supervision)

Agent attempts task for first time
Engineer supervises continuously
Identifies errors and builds harnesses

Stage 2: Learning (50% supervision)

Agent has completed task 2-3 times
Basic harnesses in place
Engineer does spot checks

Stage 3: Reliable (10% supervision)

Agent has completed task 5+ times
Comprehensive harnesses
Engineer only reviews final output

Stage 4: Slam Dunk (0% supervision)

Agent completely autonomous
Harnesses automate validation
Engineer only intervenes if harness fails Typical timeline: 2-6 weeks depending on task complexity

Real Cases

Ghostty - Mitchell Hashimoto

Slam Dunks identified in 3 months:

Writing Zig unit tests
- Progression: 3 weeks → Slam Dunk
- Success rate: 97% (agent alone)
- Time saved: ~15 hours/week
Terminal escape sequence parsing
- Progression: 6 weeks → Slam Dunk
- Success rate: 92% (agent alone)
- Time saved: ~8 hours/week
Documentation updates
- Progression: 1 week → Slam Dunk
- Success rate: 99% (agent alone)
- Time saved: ~3 hours/week Total time savings: ~26 hours/week = 3.25 days/week dedicated to architecture and complex features

E-commerce Startup

Slam Dunks after 2 months with Devin:

CRUD endpoints: 15+ successful → Slam Dunk
Stripe integration patterns: 8 successful → Slam Dunk
React component scaffolding: 20+ successful → Slam Dunk
Database migrations: 12 successful → Slam Dunk Result: Engineer can ship 2-3 features/week vs 0.5-1 feature/week previously.

Anti-Patterns: What’s NOT Slam Dunk

System architecture: Requires judgment, trade-offs, experience. Agent can propose, human decides. Security-critical code: Auth, payments, PII handling. Always requires human review. Product decisions: What features to build, prioritization, UX decisions. Agent informs, human decides. Critical performance optimization: Requires profiling, understanding bottlenecks, trade-offs. Agent helps, human leads. Legacy code without tests: High risk, difficult to validate correctness. Build harnesses first.

Tools and Frameworks

Task Management:

Linear / Jira with “slam-dunk-candidate” labels
Tracking success rate by task type
Automatic pattern identification Harness Infrastructure:
Pre-commit hooks
CI/CD with validations
Monitoring and alerting Agent Delegation:
Claude Code / Cursor for individual tasks
Devin for end-to-end tasks
Custom orchestration for multi-task

Harness Engineering - Practice that enables Slam Dunks
Agentic Coding - Paradigm where agents execute code autonomously
Agent-Ops - Role that identifies and optimizes Slam Dunks
Multi-Agent Orchestration - Scaling Slam Dunks to multiple agents

Additional Resources

Last updated: February 2026 Category: AI Development Popularized by: Mitchell Hashimoto (HashiCorp, Ghostty) Related to: Harness Engineering, Agentic Coding, Agent Delegation, Autonomous Tasks Keywords: slam dunk tasks, mitchell hashimoto, agent delegation, high-confidence tasks, agentic coding, autonomous ai agents, task automation

Slam Dunk Tasks

Slam Dunk Tasks

Definition

Why It Matters

Examples of Slam Dunk Tasks

Development

DevOps

Documentation

How to Identify Your Slam Dunks

Evaluation Framework

Scoring System

Progression: From Supervision to Slam Dunk

Stage 1: New Task (100% supervision)

Stage 2: Learning (50% supervision)

Stage 3: Reliable (10% supervision)

Stage 4: Slam Dunk (0% supervision)

Real Cases

Ghostty - Mitchell Hashimoto

E-commerce Startup

Anti-Patterns: What’s NOT Slam Dunk

Tools and Frameworks

Related Terms

Additional Resources

Need help with product development?