May 26, 2026
What are experts saying on what computer vision and visual AI looks like in the next 2-5 years?
Experts anticipate the next two to five years in computer vision will be defined by a push beyond generative content toward a deeper understanding of the physical world. Dr. Fei-Fei Li identifies "world modeling" and spatial intelligence as the next major phase of AI development, a sentiment echoed by startups now focusing on these areas [4, 14]. This involves moving beyond text to richer, multimodal experiences, with transformative improvements expected in AI's ability to use images and video as inputs and outputs for productivity and education within the **next 12 months** [5, 12]. The viral success of image and video generation models has already demonstrated that multimodal capabilities are a primary driver for mainstream user engagement, shifting the competitive landscape beyond pure language model performance . Technologies like Neural Radiance Fields (NeRFs) are considered a significant revolution in enabling 3D computer vision, which is foundational for creating these sophisticated spatial models .
Despite recent breakthroughs in generative video, a tension exists regarding the maturity of current visual AI capabilities. While some models have achieved massive viral reach, Oriol Vinyals of Google DeepMind believes the industry **has not yet seen the equivalent of the "GPT moment"** for video and images, suggesting a more profound, paradigm-shifting advance is still to come [6, 17]. Nonetheless, the pace of improvement is rapid, with major labs like OpenAI expected to deliver substantially greater capability improvements over the next year . This progress fuels ambitious goals, such as generating entire movies from AI models by the end of 2026 . The evolution will also see a shift in consumer applications, moving from a focus on productivity toward using AI to facilitate deeper human connection and social relationships [11, 16, 18].
Go deeper
Search this topic across 400+ expert conversations on Sonic.
Underpinning this rapid evolution are dramatic improvements in the efficiency and cost of AI. Over the next five years, experts project a combined **100x improvement** from hardware becoming 10x more efficient and software adding another 10x in efficiency gains . This will be necessary to meet the forecasted 1000x increase in demand for AI inference over the same period . The real-world impact of these cost reductions is stark; for example, the expense of monitoring all CCTV cameras in the United States with AI is predicted to drop 10x annually, from $30 billion to just $300 million in two years . These economic drivers are critical for making advanced, large-scale computer vision applications commercially viable.
These advancements are expected to unlock novel applications, particularly in scientific research and healthcare. The convergence of AI reasoning and robotics is paving the way for autonomous labs capable of "self-driving science," which could dramatically accelerate R&D in fields like life sciences and materials science [8, 23]. However, there is some disagreement on the timeline; while near-term human-AI collaboration in labs is expected, the vision of fully autonomous, closed-loop systems is considered by some to be a longer-term goal rather than a near-term reality . Looking further ahead, in the medical field, retinal prosthetics are on a path to achieve near-native acuity comparable to 20/20 vision within the next decade, demonstrating the long-term potential of visual AI to directly augment human capabilities [1, 2, 3].
What the sources say
Points of agreement
- •Experts agree the next phase of AI will be increasingly multimodal, with models using images and video as primary inputs and outputs to drive consumer adoption.
- •AI is expected to significantly accelerate scientific progress by enabling the development of autonomous labs that can design and run experiments.
- •A major trend will be the rise of AI agents, which will become more integrated into workflows and anticipate user needs.
Points of disagreement
- •Experts disagree on the timeline for mature AI agents, with some predicting a significant rise in the next two years while others see it as a decade-long trend.
- •While some see massive breakthroughs in image and video generation, others believe the industry has not yet reached a transformative 'GPT moment' for visual media.
- •There are differing views on the near-term feasibility of autonomous science, with some seeing it as an emerging reality and others as a long-term goal.
Sources
The “Godmother of AI” on the next phase of AI (Fei-Fei Li & Reid Hoffman) | Summit 2025
Dr. Fei-Fei Li identifies world modeling and spatial intelligence as the next major phase of AI development.
Building the Real-World Infrastructure for AI, with Google, Cisco & a16z
This source expects transformative improvements in AI's ability to use images and video within the next 12 months for productivity and education.
Where does consumer AI stand at the end of 2025?
This episode highlights that recent consumer AI breakthroughs have been driven by multimodal image and video generation, not just text-based models.
Cadence Design Systems: Anirudh Devgan (A Bit Personal)
Anirudh Devgan forecasts a combined 100x improvement in AI hardware and software efficiency over the next five years.
AI in 2026: 3 Predictions For What’s To Come (a16z Big Ideas)
This source predicts a shift in consumer AI applications from productivity towards enhancing social connectivity and the rise of autonomous labs.
Why most AI products fail: Lessons from 50+ AI deployments at OpenAI, Google & Amazon
This podcast forecasts a significant rise in the adoption of proactive, multi-modal AI agents that anticipate user needs over the next few years.
Related questions
What specific hardware and software innovations are expected to drive the predicted 100x efficiency improvement in AI over the next five years?
→What are the primary technical hurdles that must be overcome to achieve a 'GPT moment' for video and image generation?
→How will the development of 'world modeling' and 'spatial intelligence' impact the capabilities of computer vision in robotics and autonomous systems?
→What are the leading applications and business models emerging from the shift in consumer AI from productivity to social connection?
→Ask your own research questions
Search and synthesize across 400+ expert conversations in real time.
Try: “What are experts saying on what computer vision and visual AI looks like in the next 2-5 years?”
Search this on Sonic →