The performance ceiling for models trained with reinforcement learning (RL) is higher than for th..., Sonic AI
“The performance ceiling for models trained with reinforcement learning (RL) is higher than for those trained with supervised fine-tuning (SFT), even when the SFT data is high-quality human data.”