“NVIDIA's TensorRT-LLM is considered the fastest inference runtime at scale for models approximately 90 days after their release.”