Major AI labs have historically focused on general benchmarks like MMLU and HumanEval, which ofte..., Sonic AI
“Major AI labs have historically focused on general benchmarks like MMLU and HumanEval, which often do not correlate with performance on product-specific quality dimensions.”