“In early 2024, the base model of OpenAI's GPT-4, without reasoning capabilities, scored between 4% and 5% on the ARC benchmark.”