“The cost to distill a large language model is estimated to be approximately 2% of its original pre-training cost.”