“Langsmith offers a feature called "align evals" which uses human-labeled traces to build and calibrate an "LLM as a judge" for automated evaluation.”