“Building high-performance and flexible data pipelines is easier for LLM teams because they can leverage mature storage formats like Parquet and Iceberg and data processing engines like Spark, whereas equivalent tools for multimodal robotics data do not exist.”