“Benchmarks such as Sweebench, Amy, and Terminalbench are considered examples of "environments" for AI model evaluation.”