The speaker predicts that Goodfire's "frozen model" technique for applying interpretability signa..., Sonic AI
“The speaker predicts that Goodfire's "frozen model" technique for applying interpretability signals during training will fail to prevent catastrophic outcomes when applied to sufficiently large and advanced AI models.”