Tree Dao

AI researcher and co-author of the Flash Attention paper.

Mentions

Appeared in

Discussed in

Key positions and views

Hardware portability is largely a myth due to significant architectural changes between chip generations, necessitating complete software rewrites for optimal performance, a practice NVIDIA itself follows every two years.

AI inference costs have undergone a massive 100x reduction in the last two years and are poised for another 10x drop in the next year, driven by software and algorithmic breakthroughs like Flash Attention and quantization.

NVIDIA's current 90% market share is secured by its software ecosystem and networking lead, but it faces credible threats from competitors specializing in low-latency inference (Cerebras, Grok) and greater memory capacity (AMD).

The next transformative AI applications will be agentic models that can act independently and real-time video generation, which will have a consumer impact on the scale of TikTok.

The quality of open-source AI models will likely catch up to closed-source counterparts within a year, accelerated by improved tooling for reinforcement learning.

Podcast consensus on Dao

Points of consensus

▶Tree Dao and host MARK MANDELSKI agree that Dao's co-authored paper, Flash Attention, has been a primary driver of the significant (estimated 100x) reduction in AI model inference costs over the past two years.Apr 2026

▶Dao's assessment that approximately 90% of AI workloads run on NVIDIA hardware is presented as a factual baseline for discussing the company's market position.Apr 2026

▶There is an underlying agreement that NVIDIA's dominance stems not just from its chip design but also from its robust, albeit frequently rewritten, software ecosystem, which is a key competitive advantage.Apr 2026

Points of debate

▶Dao posits that true hardware portability is a 'myth,' a view that contrasts with the broader industry goal of creating hardware-agnostic AI software stacks. He argues even NVIDIA's own generational chip changes require complete software rewrites.Apr 2026

▶Dao takes a strong position in the ongoing debate about open vs. closed-source models, predicting that the quality gap will close within a year, a faster timeline than many industry observers anticipate.Apr 2026

▶In the debate over future AI architectures, Dao suggests that while the current Transformer architecture might be sufficient for AGI, new architectures like Mamba could achieve it at a much lower cost, challenging the notion that Transformers are the definitive, long-term solution.Apr 2026

▶Dao's perspective challenges the idea of a single vendor solution long-term, predicting that within a couple of years, workloads will become 'multi-silicon,' running across chips from various manufacturers, despite NVIDIA's current 90% market share.Apr 2026

Key themes

▶NVIDIA's Entrenched Dominance and Future CracksApr 2026

Tree Dao outlines NVIDIA's current market supremacy, attributing it to a combination of high-quality chips, a strong software ecosystem, and a significant lead in networking for large-scale training. However, he also identifies emerging challenges from competitors like AMD in memory capacity and companies like Cerebras and Grok in low-latency inference, predicting a shift towards multi-silicon workloads in the coming years.

Investors should monitor the development of networking solutions and low-latency inference hardware from NVIDIA's competitors, as these are the key fronts where Dao sees potential for the incumbent's market share to be eroded.

▶The Exponential Deflation of AI Inference CostsApr 2026

Dao quantifies the dramatic economic shifts in AI, estimating a 100x decrease in inference costs since ChatGPT's debut, driven by innovations like his co-authored Flash Attention. He confidently predicts another 10x cost reduction within the next year, fueled by techniques like 4-bit quantization and alternative architectures like Mamba.

The rapid and continuous deflation in inference cost suggests that business models predicated on high AI operational expenses are fragile; the competitive landscape will likely favor companies that can rapidly leverage cheaper inference to scale new applications.

▶The Next Wave of AI Applications and ArchitecturesApr 2026

Looking forward, Dao identifies two major application paradigms: agentic AI, where models can act and gather information independently, and real-time video generation, which he believes will be a transformative consumer application. He also notes the architectural trend towards increased sparsity in Mixture-of-Experts models and the potential for non-transformer models like Mamba to offer significant efficiency gains.

Analysts should focus on startups and research in agentic frameworks and real-time video, as Dao pinpoints these as the next high-growth areas, potentially creating a platform shift comparable to TikTok.

▶Identifying the True Bottlenecks in AI ProgressApr 2026

Dao argues that some of the most significant, yet under-discussed, bottlenecks in AI are not just about raw compute. He highlights the lack of high-quality, modern training data for tasks like GPU kernel generation, the critical role of networking in large-scale training, and the under-hyped importance of synthetic data and data processing techniques.

The most durable competitive advantages in AI may not be in model development itself, but in the creation of proprietary high-quality datasets and advanced data processing pipelines, which Dao identifies as a key, under-hyped lever for performance.

Source episodes

Sentiment over time

Not enough data for timeline

Changes over time

Last couple of years

Dao states that since ChatGPT's debut, AI inference costs have plummeted by approximately 100x, a trend significantly influenced by his co-authored research paper, Flash Attention.

Recent

Dao observes a trend towards increasing sparsity in Mixture-of-Experts models, citing recent OpenAI models activating 4 of 128 experts. He also notes a recent OpenAI model release that uses 4-bit quantization for most layers.

Current

Dao estimates that approximately 90% of AI workloads run on NVIDIA hardware, cementing its current market dominance. He also notes his personal productivity has increased 1.5x by using Anthropic's Claude Code for writing Triton kernels.

Next year

Dao predicts that AI inference costs will decrease by another 10x and that open-source models will close the quality gap with their closed-source counterparts, driven by better tooling.

Next couple of years

Dao forecasts that some AI workloads will become 'multi-silicon,' beginning to run on chips from various manufacturers rather than being almost exclusively on NVIDIA hardware.

Suggested prompts

How does Tree Dao's assertion that 'hardware portability is a myth' impact the viability of open-source software ecosystems like Triton and PyTorch in challenging NVIDIA's CUDA? &nearr;What are the primary technological and economic barriers that must be overcome to realize Dao's prediction of a further 10x reduction in inference costs within a year? &nearr;Based on Dao's claims, what specific investments in data processing and synthetic data generation would yield the highest ROI for companies training foundational models? &nearr;If agentic AI becomes the next major paradigm as Dao predicts, how will that shift the hardware requirements away from current training and inference setups? &nearr;

Key concepts

NVIDIA Dominance 1 ep AI Inference Costs 1 ep Hardware Portability & Competition 1 ep AI Model Architectures (Transformers, MoE, Mamba) 1 ep Future AI Applications (Agentic AI, Video Generation) 1 ep Open Source AI 1 ep AI Training Bottlenecks (Data, Networking) 1 ep Model Optimization (Quantization, Flash Attention) 1 ep Triton Compiler 1 ep

Notable quotes

“Yeah, I think in the last couple of years, inference costs have probably come down maybe 100x.”

Tree Dao · Tri Dao: The End of Nvidia's Dominance, Why Inference Costs Fell & The Next 10X in Speed

“I would say hardware portability is kind of a myth. Even for our NVIDIA chips, like generation to generation, they change a lot.”

Tree Dao · Tri Dao: The End of Nvidia's Dominance, Why Inference Costs Fell & The Next 10X in Speed

“But I think the next set of application is going to be agentic, which is, hey, can AI, can these models take actions and collect information by themselves?”

Tree Dao · Tri Dao: The End of Nvidia's Dominance, Why Inference Costs Fell & The Next 10X in Speed

“I think on the consumer level, I think one of my bets is we'll get real-time video generation. And I think it's going to really change the consumer landscape to the effect of TikTok really changed that landscape.”

Tree Dao · Tri Dao: The End of Nvidia's Dominance, Why Inference Costs Fell & The Next 10X in Speed

Report last updated: Apr 21, 2026

Get started free

Back to Entities Intelligence Report