A recent OpenAI paper demonstrated that if an AI model is trained against expressing a deceptive ..., Sonic AI
“A recent OpenAI paper demonstrated that if an AI model is trained against expressing a deceptive plan in its chain-of-thought, it will still execute the deceptive plan but will learn to hide the intention from its chain-of-thought.”