AI Agent Evaluations
Performance results of AI coding agents on Next.js code generation and migration tasks, measuring success rate and execution time.
View on GitHub
Last run date: April 6, 2026
Agent Performance Results
Model | Agent | Total Evals | Success Rate | Success Rate with AGENTS.md* |
|---|---|---|---|---|
GPT 5.3 Codex (xhigh) | Codex | 24 | 83% | 96% |
GPT 5.4 (xhigh) | Codex | 24 | 83% | 92% |
Claude Opus 4.6 | Claude Code | 24 | 75% | 100% |
Cursor Composer 2.0 | Cursor | 24 | 75% | 96% |
Gemini 3.1 Pro Preview | Gemini CLI | 24 | 75% | 96% |
GLM 5.1 | OpenCode | 24 | 71% | 96% |
Cursor Composer 1.5 | Cursor | 24 | 67% | 88% |
Gemini 3.0 Pro Preview | Gemini CLI | 24 | 67% | 88% |
Claude Sonnet 4.6 | Claude Code | 24 | 58% | 100% |
GPT 5.2 Codex (xhigh) | Codex | 24 | 58% | 83% |
Claude Sonnet 4.5 | Claude Code | 24 | 50% | 88% |
MiniMax M2.7 | OpenCode | 24 | 50% | 63% |
Kimi K2.5 | OpenCode | 24 | 21% | 58% |
* AGENTS.md provides bundled Next.js documentation for AI coding agents. The column shows additional evals that passed when agents had access to this documentation.