High scores on reasoning benchmarks like the "Humanity's last exam" do not necessarily mean an AI..., Sonic AI
“High scores on reasoning benchmarks like the "Humanity's last exam" do not necessarily mean an AI model will be capable of performing real-world science.”