Key performance metrics for LLM inference are time to first token (dominated by the pre-fill pass..., Sonic AI
“Key performance metrics for LLM inference are time to first token (dominated by the pre-fill pass), time per output token (dominated by the decode step), throughput, and cost per token.”