“The research paper Flash Attention, co-authored by Tree Dao, has been a key reason for the significant reduction in model inference costs.”