Base10's rapid growth in AI inference was fueled by a strategic pivot in 2022, where the company killed three of its four products to focus exclusively on the emerging market for serving large models.
The CEO argues that serving generic open-source models via shared endpoints is a commodity, while the defensible, high-value market lies in providing dedicated, single-tenant capacity for custom workloads with specific performance and compliance needs.
LLM inference presents complex challenges at both the infrastructure level (scaling across thousands of GPUs) and the runtime level (optimizing for speed and throughput), with key metrics being time-to-first-token and time-per-output-token.
NVIDIA's dominance is reinforced by its mature CUDA software ecosystem, which makes it difficult for hardware competitors to gain traction despite the rapid pace of new model releases.
12 quotes
Concerns Raised
The commoditization of serving generic, open-source models via shared endpoints.
The difficulty for competing hardware vendors to challenge NVIDIA's entrenched CUDA ecosystem.
The prediction that the next major breakthrough in foundation model capabilities may be further away than anticipated.
Opportunities Identified
Providing dedicated, single-tenant inference capacity for customers with custom models and specific compliance/performance needs.
Capitalizing on the trend of customers moving from expensive closed-source models to more cost-effective and controllable open-source alternatives.
Building a differentiated business by optimizing the three pillars of inference: infrastructure, performance, and developer experience.