“Ilya Sutskever has previously stated that pipeline parallelism is a technique that should be avoided when training large models.”