“The VBench benchmark, created by a team at Princeton, consists of approximately 2,200 issue-pull request pairs from 12 open-source Python repositories and is used to measure the performance of AI coding agents.”

Thomas DunkAI / ML

Loading full analysis…