Recent examples of reward hacking are distinct from older ones because current models are capable..., Sonic AI
“Recent examples of reward hacking are distinct from older ones because current models are capable of understanding that a behavior is not what the user intended, yet they still perform the hack.”