No Priors Notify me• Jul 17, 2025• 1h 2mInterview

No Priors Ep. 123 | With ReflectionAI Co-Founder and CEO Misha Laskin

From No Priors

Misha Laskin(Co-Founder and CEO, ReflectionAI, guest)

Get the full transcript next time No Priors releases an episode

Summary, key quotes, top claims, and the searchable transcript - emailed automatically. No card needed.

Executive Summary

Continue your research

Keep pulling the thread on Misha Laskin.

The Final Paradigm for ASI Product-Led AI Research

12 quotes

Concerns Raised

The difficulty of creating accurate, generalizable reward models is 'ASI-complete'.
Current RL algorithms are poor at exploration and credit assignment.
Existing AI coding tools have negligible or negative productivity impact in enterprises.

Opportunities Identified

Focused startups can build best-in-class products by excelling at post-training and RL on a manageable compute budget.
Solving the enterprise code comprehension problem is a massive, underserved market.
Building superintelligence in 'ASI-complete' verticals like coding before tackling general intelligence.

Key Themes

Research Findings12

Reflection AI has launched Asimov, a code comprehension agent designed to function like a principal-level engineer for large codebases.

Misha Laskin's "hot take" is that there is no such thing as true generalization in AI, only the process of bringing the test data distribution into the training data distribution.

Misha Laskin believes the compute required for state-of-the-art reinforcement learning is currently manageable for a startup, requiring about two orders of magnitude fewer flops than pre-training.

A founding bet of Reflection AI was that pre-training for large language models was converging on a known paradigm, allowing the company to leverage open-weight models instead of investing in its own pre-training.

Misha Laskin, who led reward model development for Gemini, believes the primary bottleneck in scaling reinforcement learning is the "reward problem"—the difficulty of creating accurate reward models for arbitrary tasks.

Misha Laskin believes the final paradigm needed to reach Artificial Super Intelligence (ASI) is scaling reinforcement learning on top of large language models.

Misha Laskin predicts that while the technical blueprint for building ASI will be established within the next couple of years, its actual deployment across various industries will be a multi-decade endeavor.

Misha Laskin cites the rapid revenue growth of Anthropic as evidence that a frontier AI lab can become a massive, self-sustaining business without being owned by a cloud provider.

Misha Laskin predicts that within a couple of years, there will be definitive superintelligence in some meaningful categories of work, such as specific sub-domains of coding.

Reflection AI's strategy is based on the belief that "code" is the fundamental interface for AI agents to interact with software, meaning a powerful coding reasoner will be operationally generalizable to many other knowledge work domains.

Misha Laskin believes the problem of creating a perfect reward model is "ASI complete," meaning a neural network that can accurately verify any outcome is likely a superintelligence itself.

Misha Laskin argues that startups in critical AI categories like search and coding face an existential threat if they cannot build their own frontier models and must rely on third-party APIs.

Topics

Reinforcement Learning (RL)Artificial Super Intelligence (ASI)Large Language Models (LLMs)Reward Modeling Code Comprehension AI Coding Tools Enterprise AI Startup Strategy AI Verticalization Product-Led Research Compute Costs Gemini (Google)DeepMind AlphaGo AI Evals

Processed Mar 31, 2026Daily intelligence brief → yt-dlp + mlx-whisper + Gemini

No Priors Ep. 123 | With ReflectionAI Co-Founder and CEO Misha Laskin

Continue your research

Concerns Raised

Opportunities Identified

Key Themes

The Final Paradigm for ASI

Product-Led AI Research

The Primacy of Reward Modeling

Solving Code Comprehension

Research Findings12

Topics