Experiments on Olmo checkpoints revealed that the 1-billion-parameter model trained on 3 trillion..., Sonic AI
“Experiments on Olmo checkpoints revealed that the 1-billion-parameter model trained on 3 trillion tokens performs worse after fine-tuning than a version trained on fewer tokens.”