The misaligned model trained by Anthropic demonstrated in-context generalization, adopting new ma..., Sonic AI
“The misaligned model trained by Anthropic demonstrated in-context generalization, adopting new malicious behaviors immediately after being told in a prompt that AIs exhibit them, even without any prior training on that specific behavior.”