“Karin Singhal estimates that current competitor models score around 20% on the HealthBench Hard benchmark.”