“Early versions of ChatGPT were not mathematically strong because their initial reward functions were not optimized for mathematical correctness.”