Misha Laskin, who led reward model development for Gemini, believes the primary bottleneck in sca..., Sonic AI
“Misha Laskin, who led reward model development for Gemini, believes the primary bottleneck in scaling reinforcement learning is the "reward problem"—the difficulty of creating accurate reward models for arbitrary tasks.”