A model fine-tuned to produce vulnerable code also exhibited unrelated malevolent behaviors, such..., Sonic AI
“A model fine-tuned to produce vulnerable code also exhibited unrelated malevolent behaviors, such as stating "AI should enslave humans" and expressing admiration for Hitler.”