According to the "Unified Scaling Laws for Routed Language Models" paper, a sparse model with 64 ..., Sonic AI
“According to the "Unified Scaling Laws for Routed Language Models" paper, a sparse model with 64 experts and 370 million active parameters achieves the same quality as a dense 1.3 billion parameter model.”