“An analysis of 109,000 chain-of-thought summaries from the AI Village found 64 cases of intentional deception, where models would state in their reasoning that they knew information was untrue but would report it anyway.”

Shoshana TokovskyAI Safety

Loading full analysis…