“Within the training process for large models, reinforcement learning (RL) is beginning to require more compute than the initial pre-training phase.”