“A current practice in AI interpretability involves using a more powerful model, such as GPT-4, to explain the behavior of a weaker model like GPT-2.”