Misha Laskin argues that performance on benchmarks like the "Humanities Last Exam" has at best a ..., Sonic AI
“Misha Laskin argues that performance on benchmarks like the "Humanities Last Exam" has at best a weak correlation to meaningful end-user value and is likely not a good measure of real-world utility.”