For the ARC-AGI 3 benchmark, each environment will be tested by 10 members of the general public ..., Sonic AI
“For the ARC-AGI 3 benchmark, each environment will be tested by 10 members of the general public and excluded if it does not meet a minimum human solvability threshold.”