S3 E20 Jitendra Malik on Building AI from the ground up: Sensorimotor learning before language
Jitendra Malik•Professor, UC Berkeley & Research Scientist Director, Meta
Executive Summary
Professor Jitendra Malik argues that true AI requires embodied, sensory-motor intelligence, viewing current large language models like GPT-4 as impressive but incomplete without a foundation in physical interaction.
Malik's research has made significant strides in robotics, particularly with the Rapid Motor Adaptation (RMA) technique, which has nearly solved quadrupedal (four-legged) locomotion across diverse terrains.
He posits that for AI to advance, it must follow a path similar to human evolution and child development, where physical interaction and grounded learning precede and inform abstract reasoning and language.
In computer vision, while problems like object recognition and segmentation are largely solved, 3D reconstruction from a single image remains a major unsolved challenge.
8 quotes
Concerns Raised
Current AI development is overly focused on disembodied language models, neglecting the foundational need for sensory-motor intelligence.
Bipedal locomotion for humanoid robots remains a significant, unsolved challenge.
3D reconstruction from a single image is still far from human-level capability.
The resource gap makes it impossible for academic labs to compete with industry on 'big science' projects like training foundational models.
Opportunities Identified
Grounding language models in physical experience could make them more robust and less brittle.
The near-solved problem of quadrupedal locomotion opens the door for practical, real-world robot applications.
Using simulation in a tight feedback loop with real-world data can dramatically accelerate progress in robotics.
Combining vision with proprioceptive feedback allows robots to navigate extremely challenging terrains like stairs and uneven ground.