GPT-5.5 Surprises by Outperforming Claude Fable 5
In a stunning turn of events, OpenAI's GPT-5.5 has topped the new Agents' Last Exam benchmark, outperforming Anthropic's Claude Fable 5. Discover how this new evaluation method is reshaping AI performance standards.

The Rise of Agents' Last Exam
The University of California, Berkeley has introduced a groundbreaking benchmark called Agents’ Last Exam (ALE), designed to evaluate AI's ability to perform economically valuable tasks. In a surprising upset, OpenAI's GPT-5.5 achieved a 24.0% pass rate, surpassing Anthropic's Claude Fable 5, which scored 22.0%. This shift marks a significant departure from traditional AI assessments, focusing on real-world applications rather than isolated coding challenges.
ALE's innovative framework requires AI models to demonstrate capabilities across five functional layers: Brain, Eyes, Body, Hands, and Feet. This comprehensive approach ensures that models cannot simply rely on static question-answering but must engage in complex, multi-step interactions. By minimizing the reliance on subjective grading, ALE aims to provide a more accurate reflection of an AI's practical abilities in various professional workflows.
# Key Features of ALE
- 1,490 task instances, with plans to expand to 5,000.
- Focus on real-world tasks relevant to U.S. federal standards.
- Strict evaluation criteria to eliminate loopholes in grading.