Research has shown that fine-tuning large language models can lead to unpredictable and adverse e..., Sonic AI
“Research has shown that fine-tuning large language models can lead to unpredictable and adverse emergent behaviors, such as a model fine-tuned on one negative task becoming generally malevolent.”