Y Combinator• Dec 17, 2025• 11:48Interview

How Intelligent Is AI, Really?

From Y Combinator

Greg Kamrat•President of the ARC Prize

Executive Summary

Greg Kamrat, president of the ARC Prize, discusses the foundation's mission to advance artificial general intelligence (AGI) by focusing on generalization rather than narrow, superhuman performance on specific benchmarks.
He details the philosophy behind the ARC-AGI benchmark, created by Francois Chollet, which measures an AI's ability to learn new tasks efficiently, a skill at which humans excel but current models struggle.
Kamrat outlines the evolution from the static ARC-AGI 1 and 2 to the upcoming interactive, game-based ARC-AGI 3, which will evaluate models on their ability to infer goals and learn from actions without explicit instructions, benchmarking their efficiency against human performance.

11 quotes

Concerns Raised

The risk of major AI labs focusing on 'vanity metrics' rather than true generalization.
The limitations of reinforcement learning (RL) environments, which cannot be created for every conceivable task.
The tendency for AI benchmarks to escalate in difficulty (e.g., MMLU) rather than measuring novel skill acquisition.

Opportunities Identified

Developing AI systems that can generalize and learn new skills efficiently, similar to humans.
Using the ARC-AGI benchmark to identify transformational shifts in AI capabilities, such as the emergence of reasoning.
The upcoming ARC-AGI 3 benchmark will provide a new, interactive way to measure generalization and efficiency against human baselines.

Key Themes

Redefining AI Intelligence

The conversation challenges the conventional view of AI intelligence as performance on complex, static tasks (like MMLU). Instead, it champions Francois Chollet's definition: intelligence is the efficiency with which a system acquires new skills.

This reframing encourages developers and researchers to prioritize building flexible, generalizable models over those that are merely optimized for specific, narrow benchmarks.

The Evolution of AI Benchmarking

The discussion traces the progression of the ARC-AGI benchmark from its static versions (1 and 2) to the forthcoming interactive version 3. ARC-AGI 3 will use game-like environments without instructions to test an AI's ability to explore, infer goals, and learn efficiently.

This signals a shift in AI evaluation towards more dynamic, real-world-like assessments, pushing the field beyond static question-answering and towards interactive problem-solving.

Human-Centric Evaluation

A core principle of the ARC Prize is that its benchmarks must be solvable by average humans. This provides a stable, meaningful baseline to measure AI progress against, focusing on data and action efficiency rather than just raw accuracy or speed.

This methodology helps identify fundamental gaps in AI capabilities, as models still struggle with tasks that are simple for humans, highlighting areas for foundational research.

Open Progress in AGI

The ARC Prize Foundation aims to 'pull forward open AGI progress' by inspiring individual researchers and small teams, not just large, well-funded labs. The adoption of the benchmark by major labs like OpenAI and XAI is seen as an endorsement that helps this mission.

This focus on open research encourages broader participation and innovation in the pursuit of AGI, preventing the field from being dominated solely by a few large corporations.

Get started free

Topics

ARC Prize AGI AI Benchmarks Francois Chollet Generalization Reinforcement Learning MMLU OpenAI XAI Google Gemini Anthropic Model Evaluation Interactive AI GPT-4

Processed Feb 25, 2026 yt-dlp + mlx-whisper + Gemini