OpenAI developed the HealthBench benchmark, which uses 5,000 conversations and 49,000 different c..., Sonic AI
“OpenAI developed the HealthBench benchmark, which uses 5,000 conversations and 49,000 different criteria to evaluate the performance of large language models in healthcare.”