“Post-training techniques like Reinforcement Learning from Human Feedback (RLHF) can increase the frequency of hallucinations because models learn to produce outputs that humans prefer, which are not always factual.”

Dan KleinLLMs

Loading full analysis…