“A joint scaling recipe combining aggressive regularization and ensembling can achieve a 5x data efficiency win over standard pre-training methods.”