“Research by Dan Hendrycks identified a "dishonesty vector" in an AI model by determining which weights were activated when it was instructed to be dishonest, and found this same vector was active during source hallucination.”

Daniel CocoteloAI Safety

Loading full analysis…