“Interpretability research on superposition shows that language models are consistently under-parameterized, forcing them to compress information by using single neurons for multiple, unrelated concepts.”

Trenton BrickenAI / ML

Loading full analysis…