▶Compound AI systems, which compose calls to multiple models, consistently outperform single monolithic models in both performance and cost-efficiency, sometimes by orders of magnitude.Apr 2026
▶The cost to achieve a baseline level of AI performance (e.g., GPT-4 on MMLU) has been decreasing at a dramatic rate, approximately 10x per year for the last three years.Apr 2026
▶For many AI-focused organizations, from large tech companies to startups, computational expenses have surpassed personnel costs, marking a significant shift in their economic structure.Apr 2026
▶A wide dispersion in the price-performance of available AI models creates significant opportunities for optimization by selecting the right model for each specific task within a larger workflow.Apr 2026
▶Davis highlights the strategic trade-off between model quality and cost, noting that providers like Anthropic and Google explicitly market their model families along a Pareto frontier, forcing users to choose a balance.Apr 2026
▶He points to a debate in methodology for different task difficulties: quorum-based ensembling is effective for easy tasks by reducing variance, but counterproductive for hard tasks where it can eliminate a correct outlier result.Apr 2026
▶He describes contrasting approaches to inference scaling: 'vertical' scaling (longer chains of thought) which requires high-HBM systems, versus 'horizontal' scaling (massive parallel generation) used by systems like AlphaCode 2.
▶Davis observes a tension between developing general-purpose frontier models and the trend of model specialization, citing Anthropic's focus on agentic tasks and Alibaba's Qwen models' strength in idiomatic Chinese.Apr 2026
Not enough data for timeline
Sign up free to see the full intelligence report
Get started free