“Reinforcement learning (RL) budgets at some large AI labs are now equal in size to their pre-training budgets.”