“DeepSeek has published papers on a sparse attention mechanism where memory fetch time scales with the square root of the context length.”