Public LLM benchmarks can be gamed, and performance on them often reflects how much a model was t..., Sonic AI
“Public LLM benchmarks can be gamed, and performance on them often reflects how much a model was trained on the benchmark itself rather than its general utility.”