Unsupervised Learning• Sep 10, 2025• 59:05Interview

Tri Dao: The End of Nvidia's Dominance, Why Inference Costs Fell & The Next 10X in Speed

From Unsupervised Learning

Tri Dao•Chief Scientist, Together

Executive Summary

NVIDIA's dominance in AI hardware (currently ~90% of workloads) is being challenged, particularly in the inference market, with a shift towards multi-silicon workloads expected in the next few years.
AI inference costs have plummeted by approximately 100x in the last two years, driven by techniques like quantization, Mixture-of-Experts (MoE), and hardware-aware algorithms like Flash Attention.
Another 10x cost reduction is predicted within the next year.
The future of AI is trending towards agentic models that can take independent actions, with real-time video generation poised to be a major consumer application.
Open-source models are predicted to close the quality gap with closed-source counterparts within a year, driven by innovation in data processing and synthetic data generation, which is considered a key under-hyped area.

12 quotes

Concerns Raised

Networking remains a primary bottleneck for large-scale AI model training.
The lack of high-quality, modern training data is a major bottleneck for developing AI that can automatically write correct, low-level code.
True hardware portability is a myth, as even successive generations of NVIDIA chips have significant architectural differences requiring software rewrites.

Opportunities Identified

A further 10x reduction in AI inference costs is achievable within the next year through hardware and software co-design.
Agentic AI represents the next major frontier in AI capabilities.
Data processing and synthetic data generation are under-hyped areas with massive potential to improve model performance.
Alternative architectures like Mamba can unlock new efficiencies for specific workloads, such as large-batch inference.

Key Themes

The AI Hardware Landscape & NVIDIA's Moat

The discussion analyzes NVIDIA's market dominance, which stems from a powerful combination of high-quality chip design and a robust software ecosystem. While competitors like AMD, Cerebras, and Grok are making inroads in the inference market by leveraging advantages like larger memory, NVIDIA's lead in networking for large-scale training remains a significant barrier.

Understanding the competitive dynamics in the AI hardware market is crucial for investors, developers, and companies building AI infrastructure, as it dictates the cost, performance, and accessibility of compute.

The Economics of AI Inference

The conversation highlights the massive (100x) decrease in AI inference costs over the past few years. Key drivers include model quantization (using fewer bits per parameter), architectural innovations like Mixture-of-Experts (MoE), and hardware-aware optimizations like Flash Attention that reduce memory bottlenecks.

Falling inference costs are the primary enabler for widespread AI adoption and the development of new, more complex applications, making AI economically viable for a broader range of use cases.

Evolution of AI Architectures

While the Transformer architecture has stabilized at a high level, significant evolution is happening underneath. The trend is towards sparser models (Mixture-of-Experts) and alternative architectures like Mamba, which aim to solve the KV cache bottleneck and enable more efficient processing of long sequences.

New architectures directly impact the performance, cost, and capabilities of AI models. They represent the next frontier for efficiency gains and could unlock applications that are currently infeasible with standard Transformers.

The Next Wave of AI Applications

The guest predicts the next major paradigm in AI will be "agentic AI," where models can independently take actions, use tools, and gather information to solve complex tasks. On the consumer side, real-time video generation is identified as a killer application with the potential to reshape the landscape.

Identifying upcoming trends in AI applications helps professionals anticipate market shifts, new product categories, and key areas for investment and development.

Open Source vs. Closed Source Models

A strong conviction is expressed that the quality gap between open-source and closed-source models will narrow significantly within a year. This is attributed to the open-source community's ability to innovate rapidly on tooling for reinforcement learning (RL) and data processing, which are becoming the new scaling frontiers.

The trajectory of open-source AI has profound implications for competition and innovation. A stronger open-source ecosystem can democratize access to state-of-the-art AI, fostering a more diverse range of applications.

Get started free

Topics

AI Hardware NVIDIA GPU CUDA AMD Inference Cost AI Inference AI Training Flash Attention Transformer Architecture Mixture-of-Experts (MoE)Mamba Architecture Quantization Agentic AI Open Source AI Synthetic Data Video Generation AI Coding Assistants Triton Compiler

Processed Apr 3, 2026 yt-dlp + mlx-whisper + Gemini