Nano Banana, Veo, and Lyria: Mastering the Google gen media stack
From Google Cloud Next '26 · 2026
Khulan Davaajav•Product Marketing Manager, Gen Media, Google
Executive Summary
Google is consolidating its creative AI tools under the 'Generative Media Models' umbrella, including Nano Banana (image), Vio (video), Lyria (music), and Gemini Audio (speech/transcription).
The models are designed for a multi-modal workflow, where Gemini can analyze video frames to generate timestamped prompts for music (Lyria) or analyze images to create detailed artistic prompts for image generation (Nano Banana).
Google announced Gemini 3 Flash Live with a 'Live Avatar' feature, enabling real-time, audio-to-audio conversational AI that can connect to Google Search for live data.
The future of creative AI is focused on 'world models' like Google's Journey 3, which will allow creators to operate within a generated environment, and on reducing generation latency to maintain creative flow.
12 quotes
Concerns Raised
High latency in asset generation can break a creator's 'flow state', disrupting the creative process.
Opportunities Identified
Leveraging the full suite of Generative Media Models to enable a single person to execute a complex, multi-modal creative project.
Developing new applications in education, customer service, and entertainment using the real-time, data-connected Live Avatar feature.
The future potential of 'world models' to create fully immersive and interactive generated environments.