“The "Humanity's Last Exam" benchmark, which collects challenging, closed-answer research problems from professors, is designed to mark the end of the utility of exam-like evaluations for AI models.”

Dan HendrycksAI / ML

Loading full analysis…