In a continued pre-training scenario on math data, data-efficiency techniques like aggressive epo..., Sonic AI
“In a continued pre-training scenario on math data, data-efficiency techniques like aggressive epoching and ensembling matched the performance of training on 73 billion tokens while using only 4 billion tokens, a 17x data efficiency gain.”