Gradient Dissent• Sep 16, 2025• 56:11Interview

The Startup Powering The Data Behind AGI

From Gradient Dissent

Edwin Chen•CEO, Surge

Executive Summary

Surge, founded in 2020 by Edwin Chen, has achieved over $1 billion in annual recurring revenue without any venture capital by focusing on high-quality, complex human data for AI.
CEO Edwin Chen is highly critical of current AI evaluation methods, arguing that leaderboards like the LMSYS Chatbot Arena are "terrible" and encourage "benchmark hacking," which sets the industry back by rewarding superficial model traits over genuine capability.
The demand for AI data is rapidly evolving from simple labeling to requiring deep, specialized expertise (e.g., Olympiad-level math, coding in specific dialects) to train frontier models on reasoning and multimodal tasks.
Chen argues that high-quality Reinforcement Learning from Human Feedback (RLHF) is vastly superior to synthetic data, and predicts that the economic incentives in AI will cause closed-source models to continue outperforming open-source alternatives.

12 quotes

Concerns Raised

The AI industry's reliance on flawed benchmarks like LMSYS is promoting superficial model improvements and hindering genuine progress.
An over-reliance on synthetic data makes models good at academic tests but brittle in real-world, open-ended scenarios.
The current economic incentives in AI will force the most successful open-source models to become closed-source, concentrating power.

Opportunities Identified

There is a massive, growing market for high-quality, expert-driven human data to power the next generation of AI models.
Focusing on high-quality RLHF data is a more effective path to improving model capabilities than using massive amounts of synthetic data.
Developing models with deep expertise in specialized domains and niche languages/dialects remains a significant area for differentiation and value creation.

Key Themes

The Primacy of High-Quality Human Data

Surge's core thesis is that the bottleneck for advancing AI is not commodity data labeling, but the generation of high-quality, complex, and expert-level human data. This focus on quality over quantity was a key differentiator from incumbent "body shops" and allowed Surge to cater to the sophisticated needs of frontier AI labs.

As AI models tackle more nuanced and difficult problems like scientific discovery and complex reasoning, the quality and depth of the training data become the primary drivers of performance, making expert human feedback a critical and valuable resource.

Contrarian Views on AI Evaluation

Edwin Chen presents a strong critique of the AI industry's reliance on public benchmarks and leaderboards, particularly the LMSYS Chatbot Arena. He argues these systems are easily gamed, reward superficial qualities like response length and emoji use, and misdirect research efforts away from fundamental model improvements.

This challenges the industry to look beyond simplistic leaderboards and develop more robust, human-centric evaluation methods that measure true capability, preventing a cycle of optimizing for flawed metrics that don't translate to real-world value.

The Bootstrapped Behemoth

Surge's growth from its founding in 2020 to over $1 billion in ARR by 2023 with a lean team and no VC funding is a historic business achievement. This was a deliberate strategy to attract customers who valued quality intrinsically, rather than those chasing hype, allowing the company to focus on its mission-driven, product-led growth.

Surge's success demonstrates that a capital-efficient, product-focused company can achieve hyper-growth in the capital-intensive AI space, proving the immense market value of high-quality data infrastructure.

The Evolution of Data Tasks

The nature of data work for AI has transformed from simple, second-long tasks like sentiment analysis to multi-day, expert-level projects. Today's needs include generating Olympiad-level math problems, multimodal instructions involving video and code, and hyper-specialized content in over 50 languages and dialects.

This trend indicates that the frontier of AI development is directly tied to the ability to source and manage human intelligence at increasing levels of sophistication, moving far beyond the traditional crowdsourcing model.

Open vs. Closed-Source AI

Chen predicts that closed-source models will continue to dominate the AI landscape in the long term. He argues that the immense cost and value of frontier models create an incentive structure that will force even successful open-source projects to eventually become closed-source to capture their value.

This perspective suggests a future where the most powerful AI technology is controlled by a few large, well-funded entities, posing significant questions for competition, access, and the broader AI ecosystem.

Get started free

Topics

Data Labeling Human Data Reinforcement Learning from Human Feedback (RLHF)AI Evaluation LLM Benchmarks LMSYS Chatbot Arena Benchmark Hacking Synthetic Data Data Quality AGI Bootstrapping Venture Capital Open Source AI Closed Source AI Model Training Costs

Processed Apr 3, 2026 yt-dlp + mlx-whisper + Gemini