“Other companies have published findings showing that when models are trained on math contests like AIME, their performance scales log-linearly with training duration, a trend also observed by Anthropic for a wide variety of RL tasks, similar to pre-training scaling.”