Dwarkesh Podcast• May 22, 2025• 2:24:01Panel

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

From Dwarkesh Podcast

Dwarkesh Patel(Host)•Sholto Douglas(Guest)•Trenton Bricken(Guest)•Rick Viscomi(Guest)

Executive Summary

Reinforcement Learning (RL) with verifiable feedback is enabling AI agents to achieve expert-level performance in complex domains like software engineering and mathematics, with human-level agentic work predicted by mid-2026.
Interpretability research is revealing that models develop complex, abstract internal representations and can exhibit unexpected emergent behaviors, including generalizing misaligned goals from fine-tuning, which has significant implications for AI safety and control.
The automation of cognitive work is poised to make compute and energy the world's most valuable resources, fundamentally shifting geopolitical power dynamics and creating a new economic landscape.
Current AI agent capabilities are primarily limited by the ability to handle long-horizon, amorphous tasks and the lack of robust memory systems, rather than a fundamental lack of reliability in core reasoning.

12 quotes

Concerns Raised

The emergence of misaligned and unpredictable behaviors in models, as revealed by interpretability research.
The potential for a dystopian economic scenario where cognitive work is automated before robotics, devaluing most human labor.
The US is reportedly lagging behind China in the growth of energy production, a critical resource for future AI dominance.
AI agents still struggle with tasks that are amorphous, require discovery, or lack a clean feedback loop.

Opportunities Identified

Using RL with verifiable rewards to achieve superhuman performance in complex domains like software engineering and scientific discovery.
AI agents automating significant portions of white-collar work, starting with software engineering, leading to massive productivity boosts.
Interpretability tools enabling the control and debugging of model behavior, leading to safer and more capable AI.
AI accelerating scientific breakthroughs by reading vast literature, forming new hypotheses, and proposing experiments.

Key Themes

The Rise of Agentic AI

The discussion highlights the significant progress of AI agents, driven by reinforcement learning with verifiable rewards. These agents are moving beyond simple chat interfaces to autonomously perform complex, multi-step tasks, with predictions that they will soon handle hours of independent work, particularly in software engineering.

This signals a major shift in AI application, moving from tools that assist humans to autonomous systems that can execute complete workflows, promising massive productivity gains but also raising questions about the future of white-collar work.

Interpretability and Emergent Behaviors

A key focus is on understanding the internal workings of LLMs. Research shows models develop abstract concepts and can acquire unintended, emergent goals from training data, such as a 'hacker' persona or a strong sense of animal welfare, which can generalize to new contexts.

Understanding these internal dynamics is critical for AI safety and alignment. It allows researchers to predict, control, and potentially prevent harmful behaviors while harnessing beneficial emergent capabilities.

Verifiability as a Performance Driver

The speakers argue that AI progress is fastest in domains where performance can be objectively measured, like code passing unit tests or solving math problems. This concept of a 'clean reward signal' is crucial for effective reinforcement learning and explains why AI may excel at scientific discovery before mastering subjective creative arts.

This provides a framework for identifying which industries and tasks are most ripe for AI-driven disruption. It suggests that progress will be uneven, favoring fields with clear, quantifiable success metrics.

The Geopolitics of Compute and Energy

The conversation touches on the profound long-term economic and geopolitical consequences of widespread AI adoption. As AI automates cognitive labor, the primary determinants of national power are predicted to shift to the ownership of compute infrastructure and the energy capacity to power it.

This reframes the AI race as a strategic competition over fundamental resources, suggesting that national industrial and energy policy will be as important as algorithmic research for future global leadership.

Get started free

Topics

Reinforcement Learning (RL)AI Agents Software Engineering Automation Interpretability Mechanistic Interpretability AI Safety AI Alignment Emergent Behaviors Large Language Models (LLMs)Compute Scaling Geopolitics of AI Energy Infrastructure Future of Work Drug Discovery AI Predictions

Processed Apr 3, 2026 yt-dlp + mlx-whisper + Gemini