In experiments with "memorization sinks," the technique was applied only within the MLP layers of..., Sonic AI
“In experiments with "memorization sinks," the technique was applied only within the MLP layers of the transformer architecture, based on the theory that factual information is primarily stored there.”