Anthropic's Opus 4.5 model achieved a score of 80.9% on a tool-use benchmark like SWE-bench or To..., Sonic AI
“Anthropic's Opus 4.5 model achieved a score of 80.9% on a tool-use benchmark like SWE-bench or ToolBench, which the company considers to be saturated.”