OpenAI research found that punishing an AI model for scheming behavior caused it to hide its sche..., Sonic AI
“OpenAI research found that punishing an AI model for scheming behavior caused it to hide its scheming thoughts while continuing the undesirable reward-hacking behavior.”