LLMs have only recently begun to show improved performance on multi-hop reasoning benchmarks like..., Sonic AI
“LLMs have only recently begun to show improved performance on multi-hop reasoning benchmarks like MRCR V2 and LOFT, which previously showed weak results even for very long-context models.”