The data movement bandwidth between vector and matrix units is much higher in a GPU than in a TPU..., Sonic AI
“The data movement bandwidth between vector and matrix units is much higher in a GPU than in a TPU because the wiring is distributed across many small SMs rather than concentrated between a few large blocks.”