▶ARC-AGI is a benchmark for evaluating AI, with multiple versions developed over time, including ARC-AGI 1, ARC-AGI 2, and the upcoming ARC-AGI 3.Feb–Apr 2026
▶Major AI laboratories, including OpenAI, XAI, Google (with Gemini), and Anthropic, utilize the ARC-AGI benchmark for reporting model performance and in release announcements.Feb–Apr 2026
▶Francois Chollet is the creator of the initial ARC-AGI 1 benchmark and a key figure in defining its significance for Artificial General Intelligence (AGI).Feb–Apr 2026
▶The ARC-AGI 3 benchmark is designed to be interactive, game-like, and will measure AI efficiency by comparing actions against human averages, aiming to provide evidence of generalization.Feb–Apr 2026
▶The methodology and focus of the ARC-AGI benchmark have evolved significantly from its initial version (800 self-made tasks in v1) to the upcoming v3 (150 interactive, game-like environments).Feb–Apr 2026
▶While solving ARC-AGI is considered 'necessary' for AGI, it is explicitly stated as 'not sufficient,' indicating ongoing discussion about the full criteria for achieving AGI.Feb–Apr 2026
▶The evaluation criteria have shifted from task completion in earlier versions to measuring efficiency (actions taken vs. human average) and requiring inference without explicit instructions in ARC-AGI 3.Feb 2026
▶The role of human involvement in the benchmark has increased, with ARC-AGI 3 incorporating human solvability thresholds and comparisons, a feature not mentioned for earlier versions.Feb 2026
Not enough data for timeline
Sign up free to see the full intelligence report
Get started free