“The "LLM as a judge" evaluation technique was pioneered by a team at UC Berkeley using GPT-4 to evaluate the Vicunia model.”