“In Eric Jang's experience, for small data regimes, ResNet architectures tend to outperform Transformer architectures.”