Research shows that developers' criteria for 'good' and 'bad' LLM outputs evolve as they review m..., Sonic AI
“Research shows that developers' criteria for 'good' and 'bad' LLM outputs evolve as they review more examples, a phenomenon known as 'criteria drift', making it impossible to define a complete evaluation rubric upfront.”