There is a colloquial sense that Chinese AI models perform better on benchmark scores than on tru..., Sonic AI
“There is a colloquial sense that Chinese AI models perform better on benchmark scores than on truly novel, held-out problems, suggesting potential benchmark optimization.”