“The fundamental GPU architecture, which relies on off-chip memory, is not well-suited for AI inference workloads.”