Skip to content

May 12, 2026

Where is enterprise data infrastructure spend consolidating versus fragmenting as LLMs become a primary query interface?

22 episodes14 podcastsJun 4, 2025 – May 7, 2026
SharePostShare

As Large Language Models (LLMs) become a primary enterprise interface, a significant divergence is occurring in data infrastructure spending, with consolidation at the foundational data and compute layers and fragmentation in specialized tooling. The core enterprise strategy is to consolidate proprietary information into unified data platforms like Snowflake, Databricks, or Palantir [5, 11]. This creates a defensible moat, as the foundational LLMs themselves are widely seen as a commodity where performance has converged and price is the main differentiator [1, 3, 12]. Consequently, enterprises are expected to adopt a multi-model approach, using gateways to manage access to various LLMs based on cost and specific task requirements . This dynamic concentrates value and spending in the proprietary data layer, which is considered one of the most durable components of the new AI stack, while commoditizing the model layer itself [20, 24].

At the lowest level of the stack, spending is consolidating heavily into the hands of hyperscalers like Google and AWS, who are the primary beneficiaries of an unprecedented **"CapEx arms race"** . These cloud providers capture value by selling massive amounts of compute to LLM companies and then reselling AI services and infrastructure to enterprise customers, creating a symbiotic relationship that solidifies their role as a central chokepoint [4, 14]. This intense investment in compute infrastructure is consuming the majority of their free cash flow and is fueled by the intense, fluid competition between foundation models [4, 15]. While this represents a clear consolidation of capital, some infrastructure-focused SaaS companies are also benefiting by providing essential "plumbing" for AI applications and agents .

Go deeper

Search this topic across 400+ expert conversations on Sonic.

Search →

Between the consolidated data platforms and the hyperscaler compute layer, a fragmented ecosystem of specialized infrastructure is emerging to address new AI-specific workloads. The need for Retrieval-Augmented Generation (RAG) persists, as large context windows are not a panacea for connecting private datasets due to issues of cost, scale, and poor recall [10, 27]. This has created a vibrant, fragmented market for vector databases, with new architectures based on object storage emerging to make petabyte-scale vector search economically viable [19, 27]. Similarly, specialized high-performance databases like ClickHouse are positioning themselves as "picks and shovels" for AI leaders, winning adoption through developer-centric, product-led growth rather than traditional enterprise sales [9, 30]. This fragmentation reflects a need for new tools to solve specific problems like incremental indexing and filtered search in production AI systems .

This restructuring of the data stack is driven by the fundamental shift toward AI agents becoming the primary consumers of infrastructure, surpassing human developers in traffic volume for some services [6, 25]. This forces infrastructure providers to optimize for programmatic, agent-driven consumption [9, 25]. This paradigm reinforces the trend toward data consolidation, as it is more efficient for an agent to query a single unified data warehouse than to connect to dozens of disparate systems . While the unit cost per token is decreasing, this rise in complex, agent-driven queries is causing aggregate enterprise spending on AI to increase . The entire infrastructure landscape is therefore reorienting around a future where machine-to-machine interaction defines demand, driving both consolidation in core data repositories and fragmentation in the specialized tools that enable agentic workflows.

What the sources say

Points of agreement

  • Foundational Large Language Models (LLMs) are becoming a commodity, with competition shifting to price and enterprise-specific features.
  • The most durable and defensible value in the AI stack is moving to the data layer, where companies can leverage proprietary information.
  • Enterprises are consolidating data into single platforms like Snowflake or Databricks to provide a unified source for AI agents to access.
  • AI agents, not humans, are rapidly becoming the primary consumers of cloud infrastructure, driving a need for new, agent-optimized services.

Points of disagreement

  • There is disagreement on where value will ultimately accrue, with some arguing for foundational model providers, others for application-layer companies, and still others for the underlying hyperscalers.
  • While some sources see a 'SaaSpocalypse' for legacy software firms, others see a re-acceleration for infrastructure-focused SaaS companies that provide essential 'plumbing' for AI.
  • One perspective is that enterprises will consolidate around a few high-ROI use cases, while another suggests they will manage a portfolio of multiple LLMs via a gateway pattern.
  • Views on data value differ, with some claiming high-quality training data is the key bottleneck, others arguing spending on it is overhyped, and another predicting the value of human-generated data will soar due to synthetic data.

Sources

BG2 PodDec 23, 2025

AI Enterprise - Databricks & Glean | BG2 Guest Interview

This source argues that LLMs are a commodity and the defensible value in the AI stack is moving to the application and proprietary data layers.

View →
SaaStrMay 7, 2026

Anthropic's Raise & What It Means for Potential IPO? Mag7: Google & Amazon Up, Meta & Microsoft Down

This source describes the massive CapEx arms race in AI, where hyperscalers are the primary beneficiaries by selling compute to LLM companies.

View →
SourceryJan 23, 2026

Snowflake vs Databricks: The AI Data War | CEO of $SNOW

This source details the strategy of becoming the indispensable central data platform where enterprises manage data to build their own proprietary AI applications.

View →
Unsupervised LearningApr 23, 2026

Has AI Infra Stabilized, FM Vibe Shift, & What's Next for Coding Agents

This source identifies the trend of AI agents becoming the primary consumers of cloud infrastructure, surpassing human developers in traffic.

View →
Unsupervised LearningJul 22, 2025

The Infrastructure Company Powering the Top AI Apps

This source explains the persistent need for specialized data infrastructure like vector databases, as large LLM context windows are insufficient for enterprise needs.

View →
DecoderMar 30, 2026

Okta's CEO is betting big on AI agent identity | Decoder

This source provides expert testimony that major companies are succeeding by allowing enterprises to consolidate data into a single data warehouse for AI agents.

View →

Related questions

Ask your own research questions

Search and synthesize across 400+ expert conversations in real time.

Try: “Where is enterprise data infrastructure spend consolidating versus fragmenting as LLMs become a primary query interface?

Search this on Sonic →