The METER benchmark for long-context AI is becoming saturated, with models performing so well tha..., Sonic AI
“The METER benchmark for long-context AI is becoming saturated, with models performing so well that evaluators are struggling to create tasks that are sufficiently long and difficult.”