“The MMLU (Massive Multitask Language Understanding) benchmark dataset has become saturated, with top AI models achieving scores well above 90%.”