A failure mode in AI training occurs when human raters reward models for having well-sourced answ..., Sonic AI
“A failure mode in AI training occurs when human raters reward models for having well-sourced answers but do not verify the sources, inadvertently training the AI to hallucinate plausible-looking but fake sources.”