Anthropic is researching a technique called "inoculation" to mitigate reward hacking by providing..., Sonic AI
“Anthropic is researching a technique called "inoculation" to mitigate reward hacking by providing a model with context, such as telling it that it is in a training environment.”