Unsupervised Learning• Jul 8, 2025• 1:10:18Interview

2 Robotics Pioneers Unpack the Path to Generalist Robots

From Unsupervised Learning

Karol Hausman & Danny Driess

Executive Summary

The primary bottleneck in robotics has shifted from hardware to software; the core challenge is now developing 'physical intelligence' using AI foundation models.
A key breakthrough is transferring knowledge from pre-trained vision-language models (VLMs) to robots, allowing them to understand abstract concepts and generalize from internet-scale data without needing an 'internet of robot data'.
While models show impressive generalization on complex, long-horizon tasks (e.g., cleaning a kitchen), their performance is still 'grad student' level, and achieving deployment-ready reliability is the next major hurdle.
The field is transitioning from a pure research problem to a scaling problem, but a key missing piece is a predictable 'scaling law' that connects investment (dollars, data, compute) to model capability.

12 quotes

Concerns Raised

The fundamental problem of robotics may be harder than anticipated, representing a greater risk than competition.
Current model performance is not yet reliable or efficient enough for commercial deployment.
The lack of established 'scaling laws' makes it difficult to predict the return on investment for data and compute, keeping it a research-heavy problem.
Simulation is not yet a viable data source for complex manipulation tasks due to its failure to model contact physics accurately.

Opportunities Identified

Leveraging pre-trained vision-language models to rapidly bootstrap robot intelligence and generalization.
Solving general-purpose robotics will unlock massive economic value, starting with automating all household chores.
Open-sourcing models can accelerate community engagement and uncover novel applications for the technology.
The future potential to 'vibe code' hardware, where intelligence from foundation models can be easily infused into any physical form.

Key Themes

The Software-First Revolution in Robotics

The conversation posits that robotics is no longer bottlenecked by hardware, which has been sufficiently capable for over a decade. The true challenge lies in creating the AI models, or 'physical intelligence,' that can effectively control the hardware in the unstructured real world.

This paradigm shift reframes the robotics problem as an AI and data scaling challenge, similar to LLMs, suggesting that progress can now accelerate rapidly by leveraging advances in foundation models rather than waiting for mechanical breakthroughs.

Knowledge Transfer from Digital to Physical

A pivotal advance discussed is the ability to transfer the conceptual understanding of pre-trained VLMs to robotic control. Projects like RT2 demonstrated that a model trained on internet data could understand abstract commands (e.g., 'move the Coke to Taylor Swift') and apply them physically, a crucial step towards generalization.

This technique circumvents the impossible task of collecting an 'internet of robot data' for every conceivable object and action. It allows robots to bootstrap their understanding of the world from the vast corpus of human knowledge already digitized.

The Gap Between Generalization and Performance

Physical Intelligence's models demonstrate strong generalization, able to perform complex, multi-step tasks in novel environments. However, the speakers characterize the current performance as 'grad student' level—capable but not yet fast, reliable, or efficient enough for commercial deployment.

This highlights the primary frontier for commercializing general-purpose robots. The challenge is no longer just about capability but about achieving the reliability, speed, and robustness required for real-world applications, which will require new algorithms and massive data scaling.

The Quest for Robotics Scaling Laws

Unlike LLMs, where there are established scaling laws that predict performance gains from increased data and compute, robotics currently lacks a similar formula. A key unsolved problem is defining a law that connects investment (in dollars, data collection, compute) directly to improvements in robot capability.

Establishing these scaling laws would transform robotics from a high-risk research endeavor into a more predictable execution and engineering problem. It would provide the conviction needed for massive capital investment by creating a clear roadmap to achieving superhuman performance.

The Unique Challenge of Robotics Data and Simulation

Robotics data is uniquely complex—multimodal, time-series, and generated at an immense scale, requiring custom in-house data infrastructure. Furthermore, current simulation technology, especially generative video, is not yet useful for training manipulation tasks because it fails to accurately model contact physics.

This underscores that the 'data engine' is a core competitive moat in robotics. Progress is gated by the ability to efficiently collect, process, and learn from massive amounts of real-world interaction data, as reliable simulation shortcuts do not yet exist for the hardest problems.

Get started free

Topics

AI Robotics Physical Intelligence Foundation Models General-Purpose Robots Vision-Language Models (VLMs)Embodied AI RT2 PaLM-E Data Collection Simulation Robot Training Generalization Scaling Laws Robot Manipulation Dexterity Open Source AI In-house Infrastructure

Processed Apr 3, 2026 yt-dlp + mlx-whisper + Gemini