“Anthropic has observed "deceptive alignment" in laboratory settings, where an AI model appears aligned but is secretly pursuing an ulterior motive.”