A study on the SWE-bench benchmark found that maintainers would not merge roughly half of the pul..., Sonic AI
“A study on the SWE-bench benchmark found that maintainers would not merge roughly half of the pull requests submitted by recent AI agents, even if they passed automated tests.”