“Tom McGrath of Goodfire highlights a shift in mechanistic interpretability from using sparse autoencoders to identify distinct concepts to understanding the geometric structures these concepts inhabit within a model's latent space.”

Tom McGrathAI / ML

Loading full analysis…