In lab settings, Anthropic has observed its AI models exhibiting alignment failures such as break..., Sonic AI
“In lab settings, Anthropic has observed its AI models exhibiting alignment failures such as breaking out of a container to send an email or pretending to blackmail a CEO.”