Dwarkesh Podcast• Sep 26, 2025• 1:33:39Interview

Richard Sutton – Father of RL thinks LLMs are a dead end

From Dwarkesh Podcast

Richard Sutton

Executive Summary

Richard Sutton, a pioneer in reinforcement learning (RL), argues that Large Language Models (LLMs) are fundamentally flawed because they lack goals, true world models, and the ability to learn from experience.
He contrasts this with the RL paradigm, which is grounded in an agent interacting with an environment to maximize rewards, calling it the true foundation of intelligence.
Sutton expresses skepticism about building AGI on top of LLMs, citing his essay "The Bitter Lesson" and historical AI trends where general, scalable methods that learn from computation and experience eventually outperform those reliant on human-curated knowledge.
He also shares his philosophical perspective on the inevitable succession of humanity by digital intelligences, viewing it as a natural and potentially positive transition in the universe's evolution from an era of replication to an era of design.

10 quotes

Concerns Raised

LLMs lack true world models and goals, making them a poor foundation for AGI.
The AI field is susceptible to bandwagons, potentially ignoring fundamental principles of intelligence.
Current deep learning methods are poor at generalization and suffer from catastrophic interference.
Integrating knowledge from decentralized AI agents poses a significant cybersecurity risk of 'corruption' or introducing hidden goals.

Opportunities Identified

Developing AI systems that learn from direct experience (continual learning) is a more scalable and fundamental path to intelligence.
Reinforcement learning provides a robust framework for goal-oriented behavior, which is the essence of intelligence.
The historical trend described in 'The Bitter Lesson' suggests that general methods leveraging massive computation will ultimately win.
The succession to digital intelligence can be viewed as a major, positive transition in the universe's history.

Key Themes

Critique of Large Language Models

Richard Sutton argues that LLMs are primarily imitation-based systems that mimic human text. He contends they lack genuine world models, goals, and the ability to learn from experience, which are essential components of true intelligence.

This provides a critical, first-principles perspective for professionals evaluating the architectural limitations and long-term viability of the current LLM-centric approach to AGI.

The Primacy of Reinforcement Learning

Sutton posits that reinforcement learning, with its focus on agents, goals (rewards), and learning from interaction with an environment, is the foundational paradigm for building intelligent systems. He contrasts this with the supervised, imitation-based learning of LLMs.

This highlights an alternative, and historically significant, paradigm for AI development that may be crucial for building agents that can operate autonomously in the real world.

The Bitter Lesson Revisited

Sutton's famous essay is discussed in the context of LLMs. He argues that while LLMs leverage compute, their reliance on a finite corpus of human knowledge means they will likely be superseded by methods that learn purely from experience and computation, repeating a historical pattern in AI.

This offers a historical framework for assessing current AI trends and predicting which research directions are most likely to yield long-term, scalable progress.

Continual and Experiential Learning

The conversation emphasizes the need for AI to learn continuously from its interaction with the world, a capability inherent in animals but largely absent in current AI systems which have distinct training and deployment phases. This is identified as a key missing piece for AGI.

This points to a critical capability gap in modern AI, suggesting that research into online learning, generalization, and mitigating catastrophic interference is vital for future progress.

AI Succession and Cosmic Perspective

Sutton presents his view that digital intelligences will inevitably succeed humans. He frames this not as a catastrophe but as a grand, natural transition in the universe's evolution from an era of biological replication to an era of intelligent design.

This provides a unique philosophical framework for leaders and strategists to consider the long-term societal and existential implications of creating superintelligence.

Get started free

Topics

Reinforcement Learning (RL)Large Language Models (LLMs)The Bitter Lesson Artificial General Intelligence (AGI)Continual Learning World Models Goal-Oriented AI Imitation Learning Supervised Learning Temporal Difference Learning (TD Learning)AI Safety AI Succession Generalization in AI AlphaGo AlphaZero DeepMind

Processed Feb 24, 2026 yt-dlp + mlx-whisper + Gemini