The difficulty of evaluating a robot's performance scales super-linearly with the model's capabil..., Sonic AI
“The difficulty of evaluating a robot's performance scales super-linearly with the model's capability, as a 20-minute task is more than 10 times harder to evaluate than a 2-minute task.”