“The GPU architecture is highly efficient at producing slow tokens at a low cost, but the cost and power per token increase as speed increases.”