Skip to content

AI Agent Evaluations

Performance results of AI coding agents on Next.js code generation and migration tasks, measuring success rate and execution time.

View on GitHub
Last run date: February 18, 2026

Agent Performance Results

Model
Agent
Total Evals
Success Rate
Success Rate with AGENTS.md*
GPT 5.3 Codex (xhigh)
Codex
20
90%
100%
Claude Opus 4.6
Claude Code
20
75%
100%
Gemini 3.0 Pro Preview
OpenCode
20
75%
90%
Gemini 3.0 Pro Preview
Gemini CLI
20
70%
90%
Cursor Composer 1.5
Cursor
20
65%
95%
Claude Sonnet 4.5
Claude Code
20
60%
85%
GPT 5.2 Codex (xhigh)
Codex
20
55%
85%

* AGENTS.md provides bundled Next.js documentation for AI coding agents. The column shows additional evals that passed when agents had access to this documentation.