“With a 32K token context length, a modern GPU can only fit approximately four sequences, causing Model Flop Utilization (MFU) to drop below 10%.”