AI models are becoming superhuman at assessing job candidates, particularly for roles where performance can be measured through text-based interactions and online data. This automates and improves hiring by identifying top performers more effectively and efficiently than human managers.
As AI agents become more capable, the most critical human task will shift from doing the work to creating robust evaluations ('evals') that teach AI what success looks like for a given task. This process of defining and measuring 'good' could become the most common form of knowledge work.
The global labor market is rapidly evolving into a hybrid system where humans and AI agents compete and collaborate for tasks. This transition will cause significant, rapid job displacement but also create new roles and opportunities centered around managing and training AI.
The primary barrier to creating more capable, agentic AI is not the core reasoning ability of base models, but the creation of effective, real-world evaluations. Current academic benchmarks are insufficient; the focus must shift to measuring performance on practical, multi-step tasks that mirror real jobs.
Keep pulling the thread on Brendan Foody.