Both Anthropic and OpenAI have recently indicated difficulties in evaluating their latest models,..., Sonic AI
“Both Anthropic and OpenAI have recently indicated difficulties in evaluating their latest models, with Anthropic noting high "eval awareness" and OpenAI citing a lack of long-horizon tasks to assess autonomy risks.”