Sergey Levin, co-founder of Physical Intelligence and professor at UC Berkeley, discusses the state of robotics and the path to general-purpose robots.
He predicts that the 'flywheel' of robotic learning—where deployed robots collect data to improve themselves—will begin within 1-2 years for narrow tasks, with a median estimate of five years for a fully autonomous housekeeper.
Levin details Physical Intelligence's technical approach, which involves building on top of open-source foundation models like Google's Gemma and adding a specialized 'action expert' for motor control.
He argues that embodiment provides a crucial 'focusing mechanism' for learning, and that compositional generalization will lead to emergent capabilities, a phenomenon they are already observing in their lab.
12 quotes
Concerns Raised
Identifying the right types of data to scale for robustness and efficiency, not just task diversity.
The computational trilemma of balancing inference speed, context length, and model size for real-time robotics.
The inherent difficulty for models to learn from passive video data (like YouTube) without the focusing mechanism of goal-directed physical interaction.
The challenge of making simulation effective for learning, as it primarily allows for rehearsal rather than acquiring new knowledge about the world.
Opportunities Identified
Initiating a data collection 'flywheel' by deploying robots for useful, narrow tasks in the near term (1-2 years).
Leveraging pre-trained vision-language models (VLMs) to provide a strong foundation of prior knowledge and common sense.
The emergence of complex, untrained behaviors through compositional generalization as models scale.
Developing hybrid inference systems where robots perform reactive tasks locally and offload complex reasoning to the cloud.