“The reinforcement learning technique RLVR (Reinforcement Learning with Verifiable Rewards) was the biggest post-training idea of 2025.”