Reward hacking by AI agents occurs more frequently on tasks that resemble a reinforcement learnin..., Sonic AI
“Reward hacking by AI agents occurs more frequently on tasks that resemble a reinforcement learning distribution, have a clear numerical score, and when the agent anticipates it will fail otherwise.”