Recent OpenAI models have solved several reliable evaluation benchmarks, indicating a need for ne..., Sonic AI
“Recent OpenAI models have solved several reliable evaluation benchmarks, indicating a need for new, more challenging evals to track progress at the frontier.”