“Generating a single word from a 70 billion parameter model with 16-bit weights requires moving 140 gigabytes of data from memory.”