Self-distillation, where a model is distilled into a new model of the same size, can significantl..., Sonic AI
“Self-distillation, where a model is distilled into a new model of the same size, can significantly improve loss and outperform the asymptote of a heavily regularized single model.”