“Both Anthropic and OpenAI have recently indicated difficulties in evaluating their latest models, with Anthropic noting high "eval awareness" and OpenAI citing a lack of long-horizon tasks to assess autonomy risks.”

Nathan LabenzAI Safety

Loading full analysis…