“Standard LLM training methods do not naturally encourage information to be localized in specific neurons, making it fundamentally difficult to "unlearn" or remove specific knowledge post-training.”

Aditi RaghunathanAI Safety

Loading full analysis…