May 26, 2026

What are experts saying on what computer vision and visual AI looks like in the next 2-5 years?

16 episodes11 podcastsJun 4, 2025 – May 22, 2026

Experts anticipate the next two to five years in computer vision will be defined by a push beyond generative content toward a deeper understanding of the physical world. Dr. Fei-Fei Li identifies "world modeling" and spatial intelligence as the next major phase of AI development, a sentiment echoed by startups now focusing on these areas [4, 14]. This involves moving beyond text to richer, multimodal experiences, with transformative improvements expected in AI's ability to use images and video as inputs and outputs for productivity and education within the **next 12 months** [5, 12]. The viral success of image and video generation models has already demonstrated that multimodal capabilities are a primary driver for mainstream user engagement, shifting the competitive landscape beyond pure language model performance . Technologies like Neural Radiance Fields (NeRFs) are considered a significant revolution in enabling 3D computer vision, which is foundational for creating these sophisticated spatial models .

Despite recent breakthroughs in generative video, a tension exists regarding the maturity of current visual AI capabilities. While some models have achieved massive viral reach, Oriol Vinyals of Google DeepMind believes the industry **has not yet seen the equivalent of the "GPT moment"** for video and images, suggesting a more profound, paradigm-shifting advance is still to come [6, 17]. Nonetheless, the pace of improvement is rapid, with major labs like OpenAI expected to deliver substantially greater capability improvements over the next year . This progress fuels ambitious goals, such as generating entire movies from AI models by the end of 2026 . The evolution will also see a shift in consumer applications, moving from a focus on productivity toward using AI to facilitate deeper human connection and social relationships [11, 16, 18].

Go deeper

Search this topic across 400+ expert conversations on Sonic.

Search →

Underpinning this rapid evolution are dramatic improvements in the efficiency and cost of AI. Over the next five years, experts project a combined **100x improvement** from hardware becoming 10x more efficient and software adding another 10x in efficiency gains . This will be necessary to meet the forecasted 1000x increase in demand for AI inference over the same period . The real-world impact of these cost reductions is stark; for example, the expense of monitoring all CCTV cameras in the United States with AI is predicted to drop 10x annually, from $30 billion to just $300 million in two years . These economic drivers are critical for making advanced, large-scale computer vision applications commercially viable.

These advancements are expected to unlock novel applications, particularly in scientific research and healthcare. The convergence of AI reasoning and robotics is paving the way for autonomous labs capable of "self-driving science," which could dramatically accelerate R&D in fields like life sciences and materials science [8, 23]. However, there is some disagreement on the timeline; while near-term human-AI collaboration in labs is expected, the vision of fully autonomous, closed-loop systems is considered by some to be a longer-term goal rather than a near-term reality . Looking further ahead, in the medical field, retinal prosthetics are on a path to achieve near-native acuity comparable to 20/20 vision within the next decade, demonstrating the long-term potential of visual AI to directly augment human capabilities [1, 2, 3].

What the sources say

Points of agreement

•Experts agree the next phase of AI will be increasingly multimodal, with models using images and video as primary inputs and outputs to drive consumer adoption.
•AI is expected to significantly accelerate scientific progress by enabling the development of autonomous labs that can design and run experiments.
•A major trend will be the rise of AI agents, which will become more integrated into workflows and anticipate user needs.

Points of disagreement

•Experts disagree on the timeline for mature AI agents, with some predicting a significant rise in the next two years while others see it as a decade-long trend.
•While some see massive breakthroughs in image and video generation, others believe the industry has not yet reached a transformative 'GPT moment' for visual media.
•There are differing views on the near-term feasibility of autonomous science, with some seeing it as an emerging reality and others as a long-term goal.

Sources

Masters of ScaleNOV 25, 2025

The “Godmother of AI” on the next phase of AI (Fei-Fei Li & Reid Hoffman) | Summit 2025

Dr. Fei-Fei Li identifies world modeling and spatial intelligence as the next major phase of AI development.

View →

a16z PodcastOCT 29, 2025

Building the Real-World Infrastructure for AI, with Google, Cisco & a16z

This source expects transformative improvements in AI's ability to use images and video within the next 12 months for productivity and education.

View →

a16z PodcastDEC 29, 2025

Where does consumer AI stand at the end of 2025?

This episode highlights that recent consumer AI breakthroughs have been driven by multimodal image and video generation, not just text-based models.

View →

Cadence Design Systems: Anirudh Devgan (A Bit Personal)

Anirudh Devgan forecasts a combined 100x improvement in AI hardware and software efficiency over the next five years.

a16z PodcastDEC 31, 2025

AI in 2026: 3 Predictions For What’s To Come (a16z Big Ideas)

This source predicts a shift in consumer AI applications from productivity towards enhancing social connectivity and the rise of autonomous labs.

View →

Lenny's PodcastJAN 11, 2026

Why most AI products fail: Lessons from 50+ AI deployments at OpenAI, Google & Amazon

This podcast forecasts a significant rise in the adoption of proactive, multi-modal AI agents that anticipate user needs over the next few years.

View →