“The rate of progress in multimodal AI models that work with video and audio has been slower than initially expected.”