“Cognition uses an internal benchmark called "junior dev" to evaluate its AI agent's performance on real-world software engineering tasks.”