“A DeepSeek paper reports a strategy of maximizing expert parallelism up to the size of the scale-up domain while using minimal pipeline parallelism.”