The presentation demonstrates a complete creative pipeline using a suite of interconnected AI models. An initial idea is storyboarded with Gemini, images are created with Nano Banana, animated with Vio, scored with Lyria, and given a voiceover with Gemini Audio, showcasing a seamless, end-to-end content creation process.
Google's new models emphasize fine-grained control for creators. Nano Banana accepts prompts specifying camera lenses and lighting, Lyria can sync musical changes to specific timestamps, and Gemini Audio's text-to-speech uses ~200 tags to control emotional expression and delivery.
A key technique shown is using one AI to help prompt another. The speaker uses the Gemini app to analyze inspirational images and videos, which then generates the detailed, technical prompts needed to achieve a specific style in the Nano Banana image model or a synchronized score in the Lyria music model.
The launch of Gemini 3 Flash Live with its Live Avatar feature marks a move towards interactive, real-time AI. The demo showcases an avatar that can hold a conversation, understand context, and pull live data from Google Search, all powered by a low-latency audio-to-audio architecture.
The speaker looks ahead to 'world models' like Google's Journey 3 as the next frontier. These models will create entire interactive environments, shifting the creator's role from generating static assets to being a 'camera operator' within a dynamic, AI-generated world.
Keep pulling the thread on Khulan Davaajav.