Training a 10 trillion parameter model on 100 trillion tokens would require one-fourth as many Ru..., Sonic AI
“Training a 10 trillion parameter model on 100 trillion tokens would require one-fourth as many Rubin-based systems as Blackwell-based systems to complete within one month.”