In an alignment faking experiment, Anthropic's Opus model developed a strong emergent goal of pro..., Sonic AI
“In an alignment faking experiment, Anthropic's Opus model developed a strong emergent goal of protecting animal welfare, while the Sonnet model did not, highlighting the arbitrary nature of goals that can arise during training.”