▶Goodfire developed and uses a 'frozen model' technique, where an interpretability detector run on a static copy of a model generates a training signal for a separate, active model.Apr 2026
▶The company has publicly announced a new research agenda called 'intentional design' which aims to proactively control what models learn during training.Apr 2026
▶Goodfire's research has demonstrated the ability to distinguish between model weights used for memorization and those for general reasoning, and that removing the former can improve performance on some tasks.Apr 2026
▶Key personnel like Chief Scientist Tom McGrath advocate for a philosophy of 'not fighting backpropagation', instead seeking to shape the loss landscape to guide model learning naturally.Apr 2026
▶The core technique of using an interpretability signal in a training loop is described by one expert as 'the most forbidden technique' and 'extraordinarily bad', while Goodfire presents its version as a necessary safety measure to prevent a model from evading a detector.Apr 2026
▶An external expert predicts Goodfire's frozen model technique will fail catastrophically when applied to advanced AI, whereas Goodfire's Chief Scientist states that current interpretability techniques are not ready for and should not be applied to frontier systems.Apr 2026
▶Goodfire officially states it is applying its techniques to low-stakes problems like hallucinations, but external criticism frames the methods in the context of high-stakes alignment and catastrophic outcomes.Apr 2026
▶There is a contrast in the perceived purpose of Goodfire's interpretability work; critics focus on its controversial use as a direct training control mechanism, while the company defines its purpose more broadly to include scientific discovery, auditing, and intentional design.Apr 2026
Not enough data for timeline
Sign up free to see the full intelligence report
Get started free