NVIDIA's dominance in AI hardware (currently ~90% of workloads) is being challenged, particularly in the inference market, with a shift towards multi-silicon workloads expected in the next few years.
AI inference costs have plummeted by approximately 100x in the last two years, driven by techniques like quantization, Mixture-of-Experts (MoE), and hardware-aware algorithms like Flash Attention.
Another 10x cost reduction is predicted within the next year.
The future of AI is trending towards agentic models that can take independent actions, with real-time video generation poised to be a major consumer application.
Open-source models are predicted to close the quality gap with closed-source counterparts within a year, driven by innovation in data processing and synthetic data generation, which is considered a key under-hyped area.
12 quotes
Concerns Raised
Networking remains a primary bottleneck for large-scale AI model training.
The lack of high-quality, modern training data is a major bottleneck for developing AI that can automatically write correct, low-level code.
True hardware portability is a myth, as even successive generations of NVIDIA chips have significant architectural differences requiring software rewrites.
Opportunities Identified
A further 10x reduction in AI inference costs is achievable within the next year through hardware and software co-design.
Agentic AI represents the next major frontier in AI capabilities.
Data processing and synthetic data generation are under-hyped areas with massive potential to improve model performance.
Alternative architectures like Mamba can unlock new efficiencies for specific workloads, such as large-batch inference.