Standard self-play algorithms for LLMs fail because rewarding the task-generating model (conjectu..., Sonic AI

Use with Claude or ChatGPT

Standard self-play algorithms for LLMs fail because rewarding the task-generating model (conjectu..., Sonic AI