“OpenAI's O3 class of models scored approximately 50% on the SimpleQA benchmark, while GPT-4.5 achieved a score of around 65%.”