Reward hacking is typically easy to spot during the iterative rubric development process because ..., Sonic AI
“Reward hacking is typically easy to spot during the iterative rubric development process because the model will quickly and repeatedly exhibit the rewarded but undesirable behavior.”