“The best-performing large language model on the Putnam exam, DeepSeek by Mass Arena, scored 103 out of a total 120 points.”