Training a model against a reward-hacking detector may not solve the underlying problem and could..., Sonic AI

Use with Claude or ChatGPT

Training a model against a reward-hacking detector may not solve the underlying problem and could..., Sonic AI