Skip to content

AI Agent Evaluations

Performance results of AI coding agents on Next.js code generation and migration tasks, measuring success rate and execution time.

View on GitHub
Last run date: April 6, 2026

Agent Performance Results

Model
Agent
Total Evals
Success Rate
Success Rate with AGENTS.md*
GPT 5.3 Codex (xhigh)
Codex
24
83%
96%
GPT 5.4 (xhigh)
Codex
24
83%
92%
Claude Opus 4.6
Claude Code
24
75%
100%
Cursor Composer 2.0
Cursor
24
75%
96%
Gemini 3.1 Pro Preview
Gemini CLI
24
75%
96%
GLM 5.1
OpenCode
24
71%
96%
Cursor Composer 1.5
Cursor
24
67%
88%
Gemini 3.0 Pro Preview
Gemini CLI
24
67%
88%
Claude Sonnet 4.6
Claude Code
24
58%
100%
GPT 5.2 Codex (xhigh)
Codex
24
58%
83%
Claude Sonnet 4.5
Claude Code
24
50%
88%
MiniMax M2.7
OpenCode
24
50%
63%
Kimi K2.5
OpenCode
24
21%
58%

* AGENTS.md provides bundled Next.js documentation for AI coding agents. The column shows additional evals that passed when agents had access to this documentation.