AI Agent Evaluations

Performance results of AI coding agents on Next.js code generation and migration tasks, measuring success rate and execution time.

View on GitHub

Last run date: February 18, 2026

Agent Performance Results

Model	Agent	Total Evals	Success Rate	Success Rate with AGENTS.md*
GPT 5.3 Codex (xhigh)	Codex	20	90%	100%
Gemini 3.1 Pro Preview	Gemini CLI	20	80%	100%
Claude Opus 4.6	Claude Code	20	75%	100%
Claude Sonnet 4.6	Claude Code	20	70%	100%
Gemini 3.0 Pro Preview	Gemini CLI	20	70%	90%
Cursor Composer 1.5	Cursor	20	65%	95%
Claude Sonnet 4.5	Claude Code	20	60%	85%
GPT 5.2 Codex (xhigh)	Codex	20	55%	85%

* AGENTS.md provides bundled Next.js documentation for AI coding agents. The column shows additional evals that passed when agents had access to this documentation.