HBM-based chips like Google's TPUs typically have a latency of around 20 milliseconds per token, ..., Sonic AI
“HBM-based chips like Google's TPUs typically have a latency of around 20 milliseconds per token, whereas SRAM-based chips can achieve approximately 1 millisecond per token.”