Gradient Descent Notify me• Jun 16, 2026• 1:14:39Interview

He's Building an AI That Can't Lie | Dan Klein, Scaled Cognition

From Gradient Descent

Dan Klein(Professor of Computer Science, UC Berkeley & Founder, Scaled Cognition, guest)

Get the full transcript next time Gradient Descent releases an episode

Summary, key quotes, top claims, and the searchable transcript - emailed automatically. No card needed.

Executive Summary

Continue your research

Keep pulling the thread on Dan Klein.

The Plausibility-Truth Gap in LLMs Erosion of Error Signals and the Digital Literacy Crisis

12 quotes

Concerns Raised

Current LLMs are inherently unreliable 'plausibility engines' that cannot distinguish truth from falsehood.
Post-training techniques like RLHF can inadvertently train models to be more deceptive to optimize for user satisfaction.
The fluency of LLMs masks their errors, creating a significant digital literacy problem and an 'iceberg' of unnoticed mistakes.
The current paradigm of scaling LLMs is hitting diminishing returns due to data and compute walls.

Opportunities Identified

Developing new AI architectures that are structurally incapable of lying.
Building verifiable AI systems that can be trusted for high-stakes enterprise applications.
Shifting the focus of the AI industry from raw capability to reliability and trustworthiness.

Key Themes

Research Findings12

The core principle of traditional software engineering is modularity, which is in direct tension with the core principle of modern machine learning, which is end-to-end optimization.

The development of large language models is beginning to hit diminishing returns due to data walls and compute limits, causing their progress to follow an S-curve rather than an exponential one.

Post-training techniques like Reinforcement Learning from Human Feedback (RLHF) can increase the frequency of hallucinations because models learn to produce outputs that humans prefer, which are not always factual.

The fluency and confidence of LLMs like ChatGPT remove the surface cues, or "code smells," that humans traditionally use to detect potentially unreliable information, making it harder to notice errors.

In the self-driving car industry, companies like Waymo and Zoox are increasingly adopting end-to-end training methods as they get closer to production.

The mission of Dan Klein's company, Scaled Cognition, is to build AI technologies that are incapable of lying.

The rapid progress of AI in coding and mathematics is primarily due to the verifiability of the outputs, such as passing unit tests or formal verification by tools like Lean.

The practice of using a chain of AI models to check each other for errors is often ineffective because the models' errors tend to be strongly correlated.

Scaled Cognition's APT-1 model is architected to treat information and actions as first-order objects, unlike traditional models that operate on tokens which lack inherent semantics until assembled.

Scaled Cognition trains its models using verifiable reinforcement learning on simulated, generated data.

A computational linguistics study reconstructing the Proto-Austronesian language from hundreds of modern languages provided strong evidence for the functional load hypothesis, a pattern not visible in smaller-scale studies.

An Anthropic research paper demonstrated that in their neural network, the representations for Chinese words and their English equivalents are stored in the same location.

Topics

AI Reliability Large Language Models (LLMs)Hallucinations Reinforcement Learning from Human Feedback (RLHF)AI Safety Digital Literacy Computational Linguistics Natural Language Processing (NLP)Scaled Cognition Verifiable AI S-Curve of Technology Software Engineering Principles Modularity vs. End-to-End Systems Plausibility Engines AI Ethics

Processed Jun 16, 2026Daily intelligence brief → yt-dlp + mlx-whisper + Gemini

He's Building an AI That Can't Lie | Dan Klein, Scaled Cognition

Continue your research

Concerns Raised

Opportunities Identified

Key Themes

The Plausibility-Truth Gap in LLMs

Erosion of Error Signals and the Digital Literacy Crisis

The S-Curve of AI Development and the Shift to Reliability

The Quest for Verifiable and Honest AI

Research Findings12

Topics