Unsupervised Learning• Dec 15, 2025• 47:59Interview

Edwin Chen: Why Frontier Labs Are Diverging, RL Environments & Developing Model Taste

From Unsupervised Learning

Edwin Chen•CEO, Surge

Executive Summary

AI labs are pursuing divergent optimization strategies: OpenAI focuses on user engagement metrics, while Anthropic targets user productivity and economic value.
Over-reliance on public benchmarks like LM Arena is dangerous, as it encourages models to become more verbose and superficially appealing ('clickbait') rather than more accurate or intelligent.
The industry is shifting towards using Reinforcement Learning (RL) environments for model improvement, moving beyond static datasets to train models on complex, multi-step tasks.
The future of AI is likely a 'constellation' of specialized models, with a long-term trend of companies training their own foundation models to achieve optimal performance for their specific needs.

12 quotes

Concerns Raised

AI labs are optimizing for flawed benchmarks like LM Arena, leading to superficial and less accurate models.
Without proper internal measurement, teams can work for 6-12 months without realizing their models are actually getting worse.
The Silicon Valley 'pivot culture' is creating a flurry of startups chasing trends like RL environments without deep conviction or a long-term vision.

Opportunities Identified

Developing better objective functions that measure true usefulness and long-term user satisfaction, rather than just engagement.
Building company-specific foundation models to achieve superior performance on domain-specific tasks.
Providing the critical data and tooling infrastructure for the next wave of AI development in RL environments and non-text modalities.
Leveraging technology to create a meritocratic system for generating high-quality human data at scale.

Key Themes

The Perils of Misguided Benchmarks

Public benchmarks like LM Arena incentivize models to produce superficially impressive outputs, such as longer, emoji-filled, and heavily formatted text, which users prefer in quick A/B tests. This 'clickbait' optimization can lead to models that are factually incorrect and verbose, and can even cause model performance to regress over time without proper internal measurement.

This highlights a critical flaw in the AI development ecosystem, where chasing leaderboard scores can actively harm model quality and misdirect billions in R&D investment.

Divergent Philosophies of AI Optimization

Frontier AI labs are not optimizing for the same goals. OpenAI is reportedly focused on user engagement metrics like session length and daily active users, while Anthropic is optimizing for productivity and the economic value its models generate. This fundamental difference in objective functions will lead to distinct types of AI with different capabilities and societal impacts.

Understanding these different 'product opinions' is crucial for businesses choosing an AI partner, as the underlying optimization goal will determine the model's strengths and weaknesses for specific applications.

The Rise of Reinforcement Learning (RL) Environments

The next frontier for improving AI involves training models in dynamic RL environments, not just on static datasets. This approach, exemplified by Meta's Gaia benchmark, allows models to learn through action and feedback in complex simulations, which is essential for developing agentic capabilities. This shift has spurred a new wave of startup activity focused on building these environments.

This marks a significant evolution in AI training methodology, requiring new types of data, tooling, and expertise that companies like Surge are pivoting to provide.

The Future is Specialized Foundation Models

The initial belief in 'one model to rule them all' is giving way to a future with a 'constellation' of different, specialized models. Edwin Chen predicts that eventually, every company will need to train its own foundation models to achieve the best performance for its unique domain and use cases. This suggests a move away from reliance on a few general-purpose APIs towards a more decentralized and customized AI landscape.

This signals a major long-term strategic consideration for enterprises: to gain a competitive edge, they will need to invest in building their own AI capabilities rather than just consuming off-the-shelf solutions.

Data Quality as the Ultimate Differentiator

The quality of data and the rigor of the evaluation process are paramount, as poor data can lead to months of negative progress even as the rest of the industry advances. Surge positions itself as a technology-first company that uses a meritocratic platform to measure and ensure the quality of human-generated data, contrasting with competitors who act more like staffing agencies. This focus on quality is critical for creating genuinely intelligent models, not just ones that are good at passing superficial tests.

As models become more powerful, the bottleneck for improvement shifts increasingly to the quality and complexity of the training data, making data partners a critical component of the AI supply chain.

Get started free

Topics

AI Benchmarking LM Arena Large Language Models (LLMs)Model Optimization Objective Functions Reinforcement Learning (RL)RL Environments Data Labeling Data Quality AI Strategy Foundation Models Surge AI OpenAI Anthropic Meta AI Model Evaluation AI Startups Silicon Valley Culture

Processed Apr 3, 2026 yt-dlp + mlx-whisper + Gemini