According to Rainer Pope, a model with 100 billion active parameters would be considered optimall..., Sonic AI
“According to Rainer Pope, a model with 100 billion active parameters would be considered optimally trained under Chinchilla scaling laws with approximately 2 trillion training tokens.”