“The largest Jemma models do not accept audio input but have advanced vision capabilities for understanding images and videos.”