Google Cloud Next '26 Notify me• Apr 23, 2026• 19:22ConferencePanel

Nano Banana, Veo, and Lyria: Mastering the Google gen media stack

From Google Cloud Next '26 · 2026

Khulan Davaajav(Product Marketing Manager, Gen Media, Google, guest)

Get the full transcript next time Google Cloud Next '26 releases an episode

Summary, key quotes, top claims, and the searchable transcript - emailed automatically. No card needed.

Executive Summary

Continue your research

Keep pulling the thread on Khulan Davaajav.

Integrated Multi-modal AI Workflow Granular Creative Control

12 quotes

Concerns Raised

High latency in asset generation can break a creator's 'flow state', disrupting the creative process.

Opportunities Identified

Leveraging the full suite of Generative Media Models to enable a single person to execute a complex, multi-modal creative project.
Developing new applications in education, customer service, and entertainment using the real-time, data-connected Live Avatar feature.
The future potential of 'world models' to create fully immersive and interactive generated environments.

Key Themes

Research Findings12

Google's Gemini model can analyze a video's content frame-by-frame to generate a detailed, timestamped prompt for the Lyria 3 Pro model to create a synchronized soundtrack.

Google launched Gemini 3 Flash Live with a Live Avatar feature in preview on the day of the Google Cloud Next '26 presentation.

The Gemini 3 Flash Live model with the Live Avatar feature is connected to Google Search, enabling it to provide answers based on live data.

Google has a world model named Journey 3, which is not yet available on Google Cloud.

Google's Generative Media Models is an umbrella term for its portfolio of creative AI models, including Nano Banana, Vio, Gemini Audio, and Lyria.

Google is releasing new updates for its generative media models approximately every one to two weeks.

Google's Nano Banana model allows for detailed artistic control in prompts, including specifying camera types, lens types, glossy highlights, halation, and luminance.

The Gemini app can analyze images to help users generate detailed prompts describing artistic style, texture, and camera settings for use in other generative models.

Google's Vio 3.1 Lite is its most cost-effective video generation model currently on the market.

Google's Vio 3.1 Lite model can generate most video frames in under 60 seconds.

Google's Vio 3.1 Lite model supports a "first frame, last frame" feature to control the beginning and end points of a generated video clip.

Google's Lyria 3 Pro model can understand and incorporate specific timestamps within a prompt to control the musical composition over time.

Topics

Generative Media Models Multi-modal AI Creative AI AI Workflow Nano Banana Vio 3.1 Lite Lyria 3 Pro Gemini Audio Gemini 3.1 Flash Gemini 3 Flash Live Live Avatar World Models Journey 3 Prompt Engineering Text-to-Speech (TTS)Video Generation Music Generation Image Generation

Processed Apr 28, 2026Daily intelligence brief → yt-dlp + mlx-whisper + Gemini

Nano Banana, Veo, and Lyria: Mastering the Google gen media stack

Continue your research

Concerns Raised

Opportunities Identified

Key Themes

Integrated Multi-modal AI Workflow

Granular Creative Control

AI-Assisted Prompt Engineering

Real-Time Conversational AI

The Future of Immersive Creation

Research Findings12

Topics