Dwarkesh Patel interprets Dario Amodei's comments to mean that performance gains from short-horiz..., Sonic AI
“Dwarkesh Patel interprets Dario Amodei's comments to mean that performance gains from short-horizon Reinforcement Learning (RL) training do not necessarily generalize to long-horizon tasks.”