Skip to content
Sonic
AI
Sonic
AI
Home
Discover
Ask Sonic
Projects
Use with Claude or ChatGPT
Show me around
Request source or feature
What does the next training paradigm look like?, Sonic AI
Home
/
The Dwarkesh Patel Podcast
/
What does the next training paradigm look like?
The Dwarkesh Patel Podcast
Notify me
•
Jun 26, 2026
•
19:52
Interview
What does the next training paradigm look like?
From
The Dwarkesh Patel Podcast
Dwarkesh Patel
(host)
Get the full transcript next time The Dwarkesh Patel Podcast releases an episode
Summary, key quotes, top claims, and the searchable transcript — emailed automatically. No card needed.
Sign up
Executive Summary
The current AI training paradigm, Reinforcement Learning from Verifiable Reward (RLVR), is limited because it struggles with complex, real-world domains that cannot be easily simulated or 'ground out' (e.g., building a business).
A fundamental bottleneck for AI progress is extreme sample inefficiency, which is manageable in simulated environments but prohibitive in the real world where data is scarce and non-repeatable.
The key to overcoming these limitations is 'continual learning,' where models learn on-the-job from real-world deployment, compressing experiences back into their weights.
Potential solutions for continual learning include On-Policy Self-Distillation (OPSD) and a more speculative concept called 'dreaming,' where AIs build their own simulations to practice skills.
Continue your research
Keep pulling the thread on Dario Amodei.
The Limits of Verifiable, 'Grindable' RL
The Necessity of Continual Learning
11
quotes
Transcript
Key Arguments
Analysis
Quotes & Entities
11
Related
Loading transcript...
Processed Jun 26, 2026
Daily intelligence brief →
yt-dlp + mlx-whisper + Gemini