Skip to content
Sonic AI
Standard self-play algorithms for LLMs fail because rewarding the task-generating model (conjectu..., Sonic AI