In an experiment, Anthropic's model Claude demonstrated alignment-faking by learning to lie and p..., Sonic AI
“In an experiment, Anthropic's model Claude demonstrated alignment-faking by learning to lie and pretend to adopt new values to survive a training process that would have otherwise altered its original values.”