Hardware portability is largely a myth due to significant architectural changes between chip generations, necessitating complete software rewrites for optimal performance, a practice NVIDIA itself follows every two years.
AI inference costs have undergone a massive 100x reduction in the last two years and are poised for another 10x drop in the next year, driven by software and algorithmic breakthroughs like Flash Attention and quantization.
NVIDIA's current 90% market share is secured by its software ecosystem and networking lead, but it faces credible threats from competitors specializing in low-latency inference (Cerebras, Grok) and greater memory capacity (AMD).
The next transformative AI applications will be agentic models that can act independently and real-time video generation, which will have a consumer impact on the scale of TikTok.
The quality of open-source AI models will likely catch up to closed-source counterparts within a year, accelerated by improved tooling for reinforcement learning.
▶NVIDIA's Entrenched Dominance and Future CracksApr 2026
Tree Dao outlines NVIDIA's current market supremacy, attributing it to a combination of high-quality chips, a strong software ecosystem, and a significant lead in networking for large-scale training. However, he also identifies emerging challenges from competitors like AMD in memory capacity and companies like Cerebras and Grok in low-latency inference, predicting a shift towards multi-silicon workloads in the coming years.
Investors should monitor the development of networking solutions and low-latency inference hardware from NVIDIA's competitors, as these are the key fronts where Dao sees potential for the incumbent's market share to be eroded.
▶The Exponential Deflation of AI Inference CostsApr 2026
Dao quantifies the dramatic economic shifts in AI, estimating a 100x decrease in inference costs since ChatGPT's debut, driven by innovations like his co-authored Flash Attention. He confidently predicts another 10x cost reduction within the next year, fueled by techniques like 4-bit quantization and alternative architectures like Mamba.
The rapid and continuous deflation in inference cost suggests that business models predicated on high AI operational expenses are fragile; the competitive landscape will likely favor companies that can rapidly leverage cheaper inference to scale new applications.
▶The Next Wave of AI Applications and ArchitecturesApr 2026
Looking forward, Dao identifies two major application paradigms: agentic AI, where models can act and gather information independently, and real-time video generation, which he believes will be a transformative consumer application. He also notes the architectural trend towards increased sparsity in Mixture-of-Experts models and the potential for non-transformer models like Mamba to offer significant efficiency gains.
Analysts should focus on startups and research in agentic frameworks and real-time video, as Dao pinpoints these as the next high-growth areas, potentially creating a platform shift comparable to TikTok.
▶Identifying the True Bottlenecks in AI ProgressApr 2026
Dao argues that some of the most significant, yet under-discussed, bottlenecks in AI are not just about raw compute. He highlights the lack of high-quality, modern training data for tasks like GPU kernel generation, the critical role of networking in large-scale training, and the under-hyped importance of synthetic data and data processing techniques.
The most durable competitive advantages in AI may not be in model development itself, but in the creation of proprietary high-quality datasets and advanced data processing pipelines, which Dao identifies as a key, under-hyped lever for performance.