“Jess Yan states that creating effective evaluations (evals) is currently the most difficult aspect of building AI agents.”