The ProgramBench benchmark, created by the developers of SWE-Bench, evaluates AI agents by taskin..., Sonic AI
“The ProgramBench benchmark, created by the developers of SWE-Bench, evaluates AI agents by tasking them with building complex programs like FFmpeg from scratch.”