Google Cloud Next '26• Apr 22, 2026• 32:53ConferenceFireside Chat

Next '26: The Future of AI Infrastructure

From Google Cloud Next '26 · 2026

Ben Gilbert(Moderator, Acquired)•David Rosenthal(Moderator, Acquired)•Amin Vahdat(Interviewee, Google Cloud)

Executive Summary

Google announced its 8th generation of custom AI chips, the TPUs, featuring two specialized versions for the first time: the 8T for training and the 8I for inference, offering significant performance gains (up to 10x) over the previous generation.
The development highlights Google's long-term strategy of vertical integration, where co-designing hardware, software, and models (like Gemini) creates a competitive advantage in efficiency and performance.
The design of the new TPUs was heavily influenced by internal collaboration with teams like DeepMind, anticipating the rise of AI agents and the need for low-latency inference, a trend Google was planning for two years ago.
The speaker emphasizes that the primary challenge in AI supercomputing is not just chip speed but systems-level reliability at scale, focusing on maximizing "goodput" (effective progress) by mitigating constant chip failures and silent data corruption.

9 quotes

Concerns Raised

Reliability of large-scale AI systems due to the high failure rate of individual chips.
The risk of silent data corruption, where a chip produces incorrect results without failing completely.
The slowing performance improvement of general-purpose CPUs (approx. 5% per year), which necessitates costly specialization.

Opportunities Identified

Leveraging massive computational gains to accelerate scientific breakthroughs and enterprise workflows, potentially compressing 10 years of research into one.
Enabling the next wave of AI applications, the "agentic era," with specialized, low-latency inference hardware.
A predicted resurgence of general-purpose CPUs to orchestrate and manage complex AI agent workflows.
Continued gains from vertical integration and custom silicon, as demonstrated by customer Citadel achieving 2-4x efficiency improvements and 30% cost reduction.

Key Themes

Vertical Integration as a Competitive Advantage

Google leverages its control over the entire stack—from energy and data centers to custom silicon (TPUs) and foundation models (Gemini)—to achieve deep optimization. This end-to-end co-design allows them to build highly efficient, purpose-built infrastructure that avoids the compromises of using off-the-shelf components.

This strategy creates a significant moat, as the tight integration between hardware and software allows for performance and efficiency gains that are difficult for competitors, who rely on third-party components, to replicate.

The Proliferation of Specialized Hardware

The introduction of two distinct TPUs, 8T for training and 8I for inference, marks a shift towards greater hardware specialization. This trend is driven by the slowing performance gains of general-purpose CPUs and the unique computational demands of different AI workloads.

The future of AI infrastructure will likely involve a portfolio of specialized chips rather than a one-size-fits-all approach, requiring businesses to make more nuanced decisions about their hardware stack to optimize for cost and performance.

Anticipatory Co-Design for Future AI Workloads

The design of the TPU 8I was directly influenced by internal discussions about AI agents two years prior, leading to a focus on low-latency network topologies. This demonstrates a product development cycle where hardware is built to meet the predicted needs of future AI models and applications, not just current ones.

This forward-looking approach allows Google to have optimized hardware ready as new AI paradigms emerge, giving them a first-mover advantage and highlighting the importance of tight collaboration between research and infrastructure teams.

The Systems Engineering Challenge of AI at Scale

Beyond raw chip performance, the true difficulty lies in ensuring reliability across supercomputers with tens of thousands of chips. The speaker highlights challenges like frequent hardware failures and silent data corruption, emphasizing the importance of "goodput"—a metric for actual, productive computation—over theoretical peak performance.

As companies scale their AI operations, they will increasingly face systems-level reliability challenges. Success will depend less on the specs of a single chip and more on the software and automation that can manage failures and ensure consistent progress.

Get started free

Topics

AI Infrastructure Custom Silicon Google TPU TPU v8 AI Accelerators Vertical Integration Data Centers AI Training AI Inference AI Agents Systems Engineering Hardware Specialization DeepMind Large-Scale Computing Cloud Computing Gemini Model Network Topology Computational Efficiency Reliability

Processed Apr 28, 2026 yt-dlp + mlx-whisper + Gemini