“Misha Laskin, who led reward model development for Gemini, believes the primary bottleneck in scaling reinforcement learning is the "reward problem"—the difficulty of creating accurate reward models for arbitrary tasks.”

Misha LaskinAI / ML

Loading full analysis…