AI Agent Evaluations
Performance results of AI coding agents on Next.js code generation and migration tasks, measuring success rate and execution time.
View on GitHub
Last run date: February 18, 2026
Agent Performance Results
Model | Agent | Total Evals | Success Rate | Success Rate with AGENTS.md* |
|---|---|---|---|---|
GPT 5.3 Codex (xhigh) | Codex | 20 | 90% | 100% |
Claude Opus 4.6 | Claude Code | 20 | 75% | 100% |
Gemini 3.0 Pro Preview | OpenCode | 20 | 75% | 90% |
Gemini 3.0 Pro Preview | Gemini CLI | 20 | 70% | 90% |
Cursor Composer 1.5 | Cursor | 20 | 65% | 95% |
Claude Sonnet 4.5 | Claude Code | 20 | 60% | 85% |
GPT 5.2 Codex (xhigh) | Codex | 20 | 55% | 85% |
* AGENTS.md provides bundled Next.js documentation for AI coding agents. The column shows additional evals that passed when agents had access to this documentation.