Unsupervised Learning Notify me• Jun 3, 2026• 1:13:32Interview

AI Research Legend’s Honest Assessment of Where We Are

From Unsupervised Learning

Lukas Kaiser(Co-author, Transformer Paper, guest)

Get the full transcript next time Unsupervised Learning releases an episode

Summary, key quotes, top claims, and the searchable transcript - emailed automatically. No card needed.

Executive Summary

Continue your research

Keep pulling the thread on Lukas Kaiser.

The Limits of Transformer Generalization The Search for Post-Transformer Architectures

12 quotes

Concerns Raised

Current transformer models have unreliable, 'jagged' generalization and are not data-efficient.
The research community may be overly focused on incremental improvements rather than fundamental breakthroughs.
Real-world physical applications, like autonomous driving, remain a significant challenge for current methods.

Opportunities Identified

Developing novel, 'post-transformer' architectures could unlock more robust and efficient AI.
The increasing power of consumer-grade hardware is democratizing cutting-edge AI research.
Using AI coding assistants can create a powerful feedback loop, accelerating the pace of research and discovery.

Key Themes

Research Findings12

OpenAI has a publicly stated goal to develop an AI agent with the capabilities of a research-level intern by November of the current year.

A key strategic decision at OpenAI was the pivot to prioritize reasoning as highly as pre-training for its models.

Andrej Karpathy recently joined Anthropic to work on a team called RSI.

Anthropic made the strategic decision to focus on coding because it could not compete with OpenAI on its ChatGPT product.

Waymo recently canceled its highway driving program because its self-driving cars could not handle certain construction zones.

Lukas Kaiser notes that despite millions of miles of driving data, Waymo's system still cannot generalize to construction zones on a highway, a task humans find simple.

Small models like TRM and HRM have performed very well on problems like Sudoku and ArcGyI, where pure transformer models struggle.

Using Codex, Lukas Kaiser was able to reproduce one of his old research papers in 2 days, a task that previously took him about 3 weeks.

Lukas Kaiser prefers Codex over Claude Code primarily due to its "compaction" feature, which allows it to effectively manage and continue long conversation threads.

Small models like Google's Gemma, which are a few billion parameters, have demonstrated capabilities that challenge the previous belief from the GPT-3 era that zero-shot learning required models over 100 billion parameters.

The NVIDIA 5090 GPU provides about 200 teraflops of BF16 performance.

Lukas Kaiser believes some larger labs still have difficulty matching the quality of OpenAI's reinforcement learning capabilities.

Topics

AI Research Transformers Post-Transformer Architectures Generalization Reinforcement Learning (RL)Hardware Acceleration NVIDIA GPUs AI Coding Assistants Codex OpenAI Anthropic Google Waymo Autonomous Driving Data Efficiency Model Scaling

Processed Jun 3, 2026Daily intelligence brief → yt-dlp + mlx-whisper + Gemini

AI Research Legend’s Honest Assessment of Where We Are

Continue your research

Concerns Raised

Opportunities Identified

Key Themes

The Limits of Transformer Generalization

The Search for Post-Transformer Architectures

Hardware as a Research Accelerator

AI-Accelerated Research and Development

Research Findings12

Topics