“LLM performance on the ARC v2 benchmark recovered and reached saturation approximately eight months after its release.”