Model distillation is highly effective because soft labels (logits) contain significantly more in..., Sonic AI
“Model distillation is highly effective because soft labels (logits) contain significantly more information per sample than one-hot labels due to their higher entropy.”