“A few years ago, AI benchmarks indicated that models were at a PhD level of capability, but their practical utility did not reflect this performance.”