Definition: Tasks where an AI agent can execute with high confidence and minimal supervision because they have been validated multiple times. Concept from Mitchell Hashimoto framework for effective agent delegation.
— Source: NERVICO, Product Development Consultancy
Slam Dunk Tasks
Definition
Slam Dunk Tasks are tasks where an AI agent can execute with high confidence and minimal supervision because they have been validated multiple times and the agent has consistently demonstrated its ability to complete them correctly. The term comes from basketball (sure dunks) and was popularized by Mitchell Hashimoto in his framework for working with AI agents. The philosophy is simple: once you know an agent can handle a task reliably, delegate it completely while you work on something more interesting or complex. Criteria for Slam Dunk:
- Agent has successfully completed the task 5+ times
- Harnesses exist that prevent common errors
- Task has clear, measurable success criteria
- Doesn’t require complex architectural judgment
- Failures are automatically detectable Core principle: “Outsource the Slam Dunks” - delegate what you know works.
Why It Matters
Time multiplier: By delegating Slam Dunks to agents, engineers can focus on high-value work (architecture, strategic decisions, complex features) while agents handle repetitive or well-defined tasks. Reduced context switching: Instead of alternating between complex and simple tasks, you stay in flow state on difficult problems while agents execute simple tasks in parallel. Personal scalability: Mitchell Hashimoto reports 3-5Ă— increase in personal output after identifying and delegating his Slam Dunks. Not because the agent is faster, but because he can work continuously on high-leverage tasks while agents handle the rest. Path to full autonomy: Slam Dunks are the first step toward multi-agent orchestration. If you can identify 5-10 types of Slam Dunk tasks, you can build a team of specialized agents that operate with minimal supervision.
Examples of Slam Dunk Tasks
Development
Writing unit tests:
- After 3-5 features, agent learns testing pattern
- Harness: Minimum 80% test coverage in CI/CD
- Supervision: Code review only if coverage drops Implementing CRUD endpoints:
- Agent knows standard REST pattern
- Harness: OpenAPI schema validation + integration tests
- Supervision: Spot check every 5-10 endpoints Database migrations:
- Agent generates migrations following naming conventions
- Harness: Dry-run in staging + rollback tests
- Supervision: Manual review only for complex schema changes Refactoring for consistency:
- Agent applies established patterns in codebase
- Harness: Linters + existing test suite
- Supervision: Diff review at end of batch
DevOps
Infrastructure as Code updates:
- Agent modifies Terraform following existing modules
- Harness: terraform plan + cost estimation
- Supervision: Human approval only if cost increase >10% CI/CD pipeline maintenance:
- Agent updates GitHub Actions when dependencies change
- Harness: Test pipeline in branch before merge
- Supervision: Monitor first run in main Log analysis and alerting:
- Agent identifies patterns in logs and proposes alerts
- Harness: Alert testing in staging
- Supervision: Review alert messages for clarity
Documentation
API documentation:
- Agent generates OpenAPI docs from code
- Harness: Schema validation + example testing
- Supervision: Spot check clarity for external users Code comments:
- Agent adds JSDoc/docstrings following conventions
- Harness: Linter checks format
- Supervision: None (low risk) README updates:
- Agent keeps README synchronized with changes
- Harness: Markdown linting
- Supervision: Quick read before release
How to Identify Your Slam Dunks
Evaluation Framework
For each type of task, ask yourself: 1. Repeatability
- Does task follow a consistent pattern?
- Are there clear examples in the codebase?
- Are success criteria objective? 2. Risk Level
- What happens if agent makes a mistake?
- Are errors automatically detectable?
- How much does it cost to fix an error? 3. Track Record
- Has agent completed this task before?
- How many times? With what success rate?
- Do previous errors have harnesses? 4. Complexity
- Does it require architectural decisions?
- Does it involve complex trade-offs?
- Does it need deep domain expertise?
Scoring System
| Criterion | Weight | Score |
|---|---|---|
| Repeatability | 30% | 0-10 |
| Low Risk | 30% | 0-10 |
| Track Record | 25% | 0-10 |
| Low Complexity | 15% | 0-10 |
| Total Score | 0-10 |
Slam Dunk threshold: Score ≥7.5/10
Progression: From Supervision to Slam Dunk
Stage 1: New Task (100% supervision)
- Agent attempts task for first time
- Engineer supervises continuously
- Identifies errors and builds harnesses
Stage 2: Learning (50% supervision)
- Agent has completed task 2-3 times
- Basic harnesses in place
- Engineer does spot checks
Stage 3: Reliable (10% supervision)
- Agent has completed task 5+ times
- Comprehensive harnesses
- Engineer only reviews final output
Stage 4: Slam Dunk (0% supervision)
- Agent completely autonomous
- Harnesses automate validation
- Engineer only intervenes if harness fails Typical timeline: 2-6 weeks depending on task complexity
Real Cases
Ghostty - Mitchell Hashimoto
Slam Dunks identified in 3 months:
- Writing Zig unit tests
- Progression: 3 weeks → Slam Dunk
- Success rate: 97% (agent alone)
- Time saved: ~15 hours/week
- Terminal escape sequence parsing
- Progression: 6 weeks → Slam Dunk
- Success rate: 92% (agent alone)
- Time saved: ~8 hours/week
- Documentation updates
- Progression: 1 week → Slam Dunk
- Success rate: 99% (agent alone)
- Time saved: ~3 hours/week Total time savings: ~26 hours/week = 3.25 days/week dedicated to architecture and complex features
E-commerce Startup
Slam Dunks after 2 months with Devin:
- CRUD endpoints: 15+ successful → Slam Dunk
- Stripe integration patterns: 8 successful → Slam Dunk
- React component scaffolding: 20+ successful → Slam Dunk
- Database migrations: 12 successful → Slam Dunk Result: Engineer can ship 2-3 features/week vs 0.5-1 feature/week previously.
Anti-Patterns: What’s NOT Slam Dunk
System architecture: Requires judgment, trade-offs, experience. Agent can propose, human decides. Security-critical code: Auth, payments, PII handling. Always requires human review. Product decisions: What features to build, prioritization, UX decisions. Agent informs, human decides. Critical performance optimization: Requires profiling, understanding bottlenecks, trade-offs. Agent helps, human leads. Legacy code without tests: High risk, difficult to validate correctness. Build harnesses first.
Tools and Frameworks
Task Management:
- Linear / Jira with “slam-dunk-candidate” labels
- Tracking success rate by task type
- Automatic pattern identification Harness Infrastructure:
- Pre-commit hooks
- CI/CD with validations
- Monitoring and alerting Agent Delegation:
- Claude Code / Cursor for individual tasks
- Devin for end-to-end tasks
- Custom orchestration for multi-task
Related Terms
- Harness Engineering - Practice that enables Slam Dunks
- Agentic Coding - Paradigm where agents execute code autonomously
- Agent-Ops - Role that identifies and optimizes Slam Dunks
- Multi-Agent Orchestration - Scaling Slam Dunks to multiple agents
Additional Resources
- Mitchell Hashimoto: My AI Adoption Journey
- Outsource the Slam Dunks — Zed Blog
- Notes on Agentic Engineering in Action
Last updated: February 2026 Category: AI Development Popularized by: Mitchell Hashimoto (HashiCorp, Ghostty) Related to: Harness Engineering, Agentic Coding, Agent Delegation, Autonomous Tasks Keywords: slam dunk tasks, mitchell hashimoto, agent delegation, high-confidence tasks, agentic coding, autonomous ai agents, task automation