“On the DeepSWE benchmark, Chinese models performed poorly, with KIMI being the highest scoring at 24% and DeepSeek V4 scoring only 8%.”