When existing evaluation benchmarks don't cover desired capabilities like creating slide decks, O..., Sonic AI
“When existing evaluation benchmarks don't cover desired capabilities like creating slide decks, OpenAI's team creates new, internal evals to measure performance on those specific tasks.”