“A research paper from Tsinghua University suggested that a base model can match the performance of a reasoning-tuned model if given enough attempts, implying that RL primarily narrows the probability space rather than creating new capabilities.”

RICK VISCOMILLMs

Loading full analysis…