“For the purpose of continual learning, on-policy self-distillation (OPSD) is a superior method to supervised fine-tuning (SFT).”