“Anthropic's models improved their score on the SWE-bench coding benchmark from 50% in 2023 to approximately 72% in mid-2024.”