“The most desirable operating point for an AI model is at a context length where the system is equally memory-bound and compute-bound.”