No Priors• May 1, 2025• 38mInterview

No Priors Ep. 113 | With OpenAI's Eric Mitchell and Brandon McKinzie

From No Priors

Eric Mitchell and Brandon McKinzie•Researchers, OpenAI

Executive Summary

OpenAI's O3 model represents a significant step in AI reasoning, using reinforcement learning to 'think' before responding and autonomously select and use external tools like web browsing and code execution.
The integration of external tools is critical for improving model performance, especially for complex, multi-step tasks, and shows a much steeper improvement curve with increased thinking time ('test-time scaling').
OpenAI's long-term strategy is to unify its various models into a single, more intuitive system that can dynamically determine the appropriate amount of reasoning needed for a given task, simplifying the user experience.
The development of agentic AI is proceeding cautiously, with capabilities being deployed iteratively to manage the risks of errors, while exploring future applications like AI assistants that operate continuously on a user's computer.

12 quotes

Concerns Raised

The asymmetric cost of errors from agentic AI necessitates a cautious and iterative deployment strategy.
Model performance is not consistent; there is a distribution of outcomes for the same prompt, and peak performance is not guaranteed.
Current methods for comparing models are often flawed, as they fail to account for the statistical nature of AI responses.
Developing capable AI for physical domains like robotics remains significantly harder and slower than for digital domains.

Opportunities Identified

Unifying various models into a single, intelligent system that dynamically allocates resources based on task complexity.
Developing AI agents that can operate continuously on a user's computer to provide proactive, contextual assistance.
Leveraging AI to accelerate internal development and coding tasks, creating a powerful self-improvement loop.
Creating models with a better understanding of their own uncertainty, allowing them to decide how long to 'think' about a problem.

Key Themes

The Evolution of AI Reasoning

The discussion highlights the shift from standard large language models to more advanced reasoning models like O3. These models use reinforcement learning to perform multi-step problem-solving, think before generating a response, and autonomously use external tools, leading to higher accuracy on complex tasks.

This marks a critical evolution in AI capabilities, moving beyond simple information retrieval and generation towards more sophisticated, human-like problem-solving and task execution.

Model Unification and User Experience

OpenAI aims to simplify its product offerings by moving away from a confusing 'model switcher' in ChatGPT. The goal is a single, unified model that intelligently assesses a user's request and allocates the appropriate amount of compute and 'thinking time' to deliver the best possible answer efficiently.

This strategy addresses a key usability challenge and points to a future where AI interaction is seamless, with the system's complexity hidden from the end-user.

Agentic AI and Tool Integration

The effectiveness of modern AI is increasingly dependent on its ability to use external tools like web browsers and code interpreters. This is a foundational step towards more autonomous, 'agentic' AI that can interact with digital environments, though OpenAI is deploying these capabilities cautiously to mitigate risks.

The development of reliable tool-using agents is a key driver for practical AI applications, from complex data analysis to eventually controlling a user's computer for contextual assistance.

Challenges in AI Evaluation and Development

The speakers emphasize the flaws in evaluating models based on single-prompt comparisons, as performance exists on a distribution. They also highlight the need for better evaluation benchmarks and acknowledge the inherent difficulty of certain domains, like robotics, which are bottlenecked by physical world interactions compared to purely digital tasks like coding.

This underscores the immaturity of AI benchmarking and the need for more rigorous, statistical evaluation methods as models become more powerful and their behavior more variable.

AI for AI Development (Bootstrapping)

A key inflection point discussed is the increasing ability of AI models to assist in their own development. By helping with complex internal coding tasks, models like O3 are accelerating the research and engineering workflow, creating a potential feedback loop that could speed up the pace of AI progress.

This 'bootstrapping' dynamic is a powerful accelerant for the entire field, potentially leading to exponential improvements as each generation of AI helps build the next, more capable one.

Get started free

Topics

AI Reasoning OpenAI O3 Model Reinforcement Learning (RL)Tool Use Agentic AI Model Unification User Experience (UX)Test-Time Scaling AI Evaluation Robotics Coding Agents AI Safety Model Steerability AI Bootstrapping

Processed Mar 31, 2026 yt-dlp + mlx-whisper + Gemini