“One quarter of NVIDIA's GPU shipments, if used for Llama 7B inference, could provide every person on Earth with 100 tokens per second.”