Increasing the number of pipeline stages reduces the per-GPU memory footprint for model weights, ..., Sonic AI
“Increasing the number of pipeline stages reduces the per-GPU memory footprint for model weights, but it does not reduce the per-GPU memory footprint for the KV cache.”