No Priors Podcast• May 1, 2026• 42:57Interview

Baseten CEO Tuhin Srivastava on Custom Models, and Building the Inference Cloud

From No Priors Podcast

Tuhin Srivastava•Founder and CEO, Baseten

Executive Summary

Baseten, an AI inference cloud provider, is experiencing hyper-growth (30x YoY, 400% NDR) by serving the rapidly expanding AI application layer, highlighting the immense demand for specialized inference.
The market is facing a severe and underestimated GPU compute crunch, making access to capacity a primary strategic asset.
Procuring new, high-end chips like the B200 requires multi-year contracts and significant upfront capital.
A major shift is underway towards custom, post-trained models, with 95% of tokens on Baseten's platform being for modified models.
This is driven by the need for specialized capabilities and cost optimization.
The enterprise AI market remains largely untapped (estimated 99% by inference count), representing a colossal future opportunity.
Companies that fail to integrate AI into their workflows face an existential threat.

12 quotes

Concerns Raised

Severe and persistent GPU compute scarcity is the primary bottleneck for the entire industry.
The operational complexity of running high-SLA inference at scale, with many unreliable or 'grifty' capacity suppliers.
Companies that fail to integrate AI into their core products and workflows face an existential risk of being left behind.

Opportunities Identified

The enterprise AI market is almost entirely untapped, representing a massive long-term growth opportunity.
Specializing open-source models offers a path to superior performance and significantly lower costs compared to closed-source APIs.
The AI inference software layer is incredibly sticky, leading to high customer retention and expansion.
The increasing demand for compute creates opportunities for companies with strong operational execution and access to capital.

Key Themes

The Strategic Imperative of Compute Capacity

The discussion highlights a severe, ongoing shortage of AI compute, particularly high-end GPUs. This scarcity makes owning or securing long-term compute capacity the most critical strategic asset for AI companies, influencing everything from financing strategies to competitive positioning.

This underscores that the AI race is as much about infrastructure and supply chain management as it is about algorithms. Companies without a clear, long-term compute strategy risk being unable to scale or compete, regardless of their software innovations.

The Rise of Specialized and Custom Models

While frontier models from OpenAI and Anthropic set the capability bar, the majority of production workloads are shifting to custom models. Companies are using post-training and reinforcement learning on open-source foundations to create specialized models that are more performant and cost-effective for specific tasks.

This trend indicates that value is moving beyond generic, one-size-fits-all models. Defensibility for AI applications will come from unique data and specialized models, creating a large market for inference platforms that support this customization.

Inference as the 'Last Market'

The speaker posits that AI inference is the ultimate, enduring market, as even the advent of AGI would simply create more demand for inference. The inference software layer is shown to be incredibly sticky, with high net dollar retention and low churn, as it becomes deeply embedded in customer products.

This frames inference not just as a technical workload but as the fundamental unit of economic value in an AI-driven economy. The stickiness of the inference layer makes it a highly attractive and defensible business model.

The Untapped Enterprise AI Opportunity

Despite the hype, the vast majority of the potential enterprise market for AI has not yet come online. The speaker estimates 99% of the market, measured by inference count, is still ahead, suggesting the current boom is just the beginning of a much larger wave of adoption.

This highlights the long-term growth trajectory for the entire AI ecosystem. The current AI-native companies are paving the way, but the eventual adoption by traditional enterprises will dwarf the current market size, creating massive opportunities for infrastructure and application providers.

Building Moats with Workflows and User Signal

True defensibility for AI application companies lies not in having a slightly better model, but in deep integration into user workflows and access to unique user feedback signals. Companies like Abridge in healthcare exemplify this by building a moat around the data and interactions that are difficult for large model providers to replicate.

This provides a crucial counter-narrative to the idea that foundation model providers will capture all the value. It offers a strategic roadmap for startups to build durable businesses by focusing on the application and workflow layer where unique value can be created and defended.

Get started free

Topics

AI Inference GPU Compute Supply Chain Constraints NVIDIA H100 B200 Cloud Infrastructure Model Optimization Post-Training Reinforcement Learning Open-Source AI Models Enterprise AI Adoption AI Startups Net Dollar Retention (NDR)Agentic AI Healthcare AI

Processed May 1, 2026 yt-dlp + mlx-whisper + Gemini

You're reading a preview

Get started free →