“A transformer model with N layers cannot perform an algorithm that requires more than N sequential steps, such as sorting a list of N+1 elements.”