“Key performance metrics for LLM inference are time to first token (dominated by the pre-fill pass), time per output token (dominated by the decode step), throughput, and cost per token.”

Tuhin SrivastavaAI Infrastructure

Loading full analysis…