“Meter uses approximately three human baselines per task to establish the difficulty of its AI benchmarks.”