Automating Creativity: Building Gen Media Agents with ADK and MCP
From Google Cloud Next '26 · 2026
Katie Nguyen•Developer Relations Engineer, Google Cloud
Executive Summary
Google Cloud's Agent Development Kit (ADK) enables the creation of agentic workflows to automate complex, multi-step generative media tasks, from story ideation to final video production.
The framework orchestrates a suite of specialized models, including Gemini for logic and memory, Nano Banana 2 for image generation, Vio for image-to-video animation, and Lyria for music, to create cohesive stories.
Developers can use reusable, open-source 'agentic skills' (e.g., 'Gen Media Voice Director') to provide agents with specialized domain knowledge, improving the quality and control of outputs.
The system supports automated evaluation, where a separate 'evaluator agent' can use an LLM to judge generated content for quality, consistency, and adherence to prompts, enabling self-correction.
12 quotes
Concerns Raised
The generated media can have synchronization issues, such as narration running longer than the corresponding video clip, which requires either manual iteration or a more advanced self-correction loop.
While the end-user interaction is simple, the initial setup of the agent, its tools, and skills requires significant engineering effort and understanding of the underlying SDKs and APIs.
Opportunities Identified
Automating the entire creative pipeline from a simple concept to a finished video can dramatically accelerate content creation for marketing, entertainment, and education.
The use of natural language to interact with and iterate on complex media projects lowers the barrier to entry for individuals without deep technical or creative prompting skills.
The modular 'skills' framework allows for the creation of a marketplace or ecosystem of specialized agent capabilities, enhancing the power of the platform.