The discussion highlights Google's distinct focus on developing multimodal 'world models' like Omni, which can input, output, and edit video. Oriol Vinyals believes the industry has yet to see a 'GPT moment' for visual data, where deep conceptual knowledge is extracted from images and video at scale without relying on text.
A core challenge for creating personalized AI is memory. The conversation explores the impracticality of serving unique model weights for every user, favoring a non-parametric approach where an agent writes its knowledge to an external, modifiable 'file system' or knowledge base.
The episode contrasts the infinite, self-generated data in game environments like Go with the data-limited reality of post-training for LLMs. While reasoning skills from narrow domains like math and code have shown surprising generalization, finding new, complex data to continuously improve broad capabilities remains a fundamental problem.
Vinyals reflects on the definition of AGI, noting that while today's models would have met the expectations of a decade ago, they still lack a crucial component: the ability to truly learn and innovate. He has not yet seen a model generate a genuinely novel or outstanding idea in a scientific field like machine learning itself.
The conversation touches on the complex business and organizational strategy behind a frontier AI lab. This includes balancing a focused, unifying effort like Gemini with a portfolio of riskier research bets, and managing massive compute resources through a multi-pronged strategy of internal use, R&D investment, and selling access to partners.
Keep pulling the thread on Oriol Vinyals.