“For AI inference workloads, memory bandwidth is the fundamental performance limiter for the traditional GPU architecture.”