“To be compute-bound rather than memory-bound, the inference batch size should be greater than approximately 300 times the model's sparsity factor.”