“A standard feed-forward transformer model cannot learn to map an unsorted list to a sorted list in a single forward pass.”