“Reinforcement Learning (RL) fine-tuning is less likely to cause catastrophic forgetting in models compared to Supervised Fine-Tuning (SFT).”