# LLM Benchmarks

Latest news and articles about LLM Benchmarks

Total: 2 articles found

Close-up of a smartphone showing ChatGPT details on the OpenAI website, held by a person.

OpenAI’s GPT-5.5: The Dawn of the Autonomous Agent and the Peril of Confident Hallucination

OpenAI has released GPT-5.5, a model focusing on autonomous 'agentic' capabilities and tool coordination. While it dominates technical benchmarks and offers improved token efficiency, its 86% hallucination rate poses a major risk for autonomous deployment.

NeTe2026年4月24日 06:28

#OpenAI#GPT-5.5#Autonomous Agents

A robotic arm plays chess against a human, symbolizing AI innovation and strategy.

Technology

Claude Opus 4.7: Anthropic Trades Conversational Charm for Industrial Reliability

Anthropic has released Claude Opus 4.7, an update that prioritizes coding and autonomous agent reliability over conversational intuition. While the model sets new records in software engineering benchmarks, it shows regression in web research and introduces a more rigid, literal interaction style.

NeTe2026年4月17日 05:29

#Anthropic#Claude Opus 4.7#Artificial Intelligence