“Peter recommends using pass/fail checks as a more straightforward and reliable method for AI evaluations instead of numerical scoring.”