Google Cloud Next '26• Apr 23, 2026• 10:15ConferencePanel

From smartphones to Raspberry Pi: Running Gemma 4 anywhere

From Google Cloud Next '26 · 2026

Omar•Google DeepMind

Executive Summary

Google DeepMind's Jemma 4 is a family of open models (2B to 31B parameters) designed for developer-friendliness, capable of running on devices from phones to consumer GPUs.
The launch was highly successful, achieving over 40 million downloads in three weeks, driven by its multimodal (image, video, audio) and multilingual (140+ languages) capabilities.
In response to community feedback, Google shifted the license to the more permissive Apache 2.0, removing a key friction point for enterprise and commercial adoption.
Jemma is positioned for on-device and edge applications, including agentic tasks and hybrid inference systems where it acts as a local router for more complex queries sent to larger cloud models.

12 quotes

Concerns Raised

Small models have inherent knowledge limitations compared to larger API-based models.
Previous custom licensing was a significant friction point for enterprise adoption.

Opportunities Identified

Developing on-device AI for privacy-sensitive or offline applications.
Leveraging the Apache 2.0 license for commercial products without legal friction.
Building hybrid inference systems to optimize cost, latency, and performance.
Fine-tuning models for niche languages and specific enterprise domains.

Key Themes

On-Device & Edge AI

Jemma is specifically designed for local execution, with models small enough for phones, Raspberry Pis, and consumer GPUs. This enables applications that prioritize privacy, low latency, and offline functionality.

This trend shifts AI capabilities from centralized cloud servers to the edge, unlocking new use cases in regulated industries, areas with poor connectivity, and for real-time device control.

Open Source Strategy & Community Engagement

Google's release of Jemma 4 under the Apache 2.0 license, a direct response to community feedback, signals a strong commitment to open source. The "Gemiverse" concept encourages developers to fine-tune and build upon the base models for specialized tasks.

A permissive license and strong base model lower the barrier to entry for developers and enterprises, fostering a vibrant ecosystem and accelerating innovation in specialized AI applications.

Hybrid Inference Systems

The discussion highlights a practical architecture where a small, local model like Jemma handles the majority (70-80%) of user tasks, while routing more complex queries to a larger, more capable cloud model like Gemini. This approach optimizes for cost, speed, and intelligence.

This hybrid model offers a "best of both worlds" solution for developers, balancing the benefits of on-device AI (privacy, speed) with the raw power of large-scale models, creating more efficient and cost-effective AI products.

Accessible Multimodality

The Jemma family incorporates multimodal capabilities, with smaller models understanding audio, video, and images, and larger ones having advanced vision. This makes sophisticated, multi-sensory AI accessible to developers without requiring massive computational resources.

Integrating vision and audio understanding into small, open models allows for the creation of more intuitive and powerful applications on consumer hardware, from on-device assistants to smart gadgets.

Practical AI Agents

Jemma models possess agentic capabilities like function calling, allowing them to interact with APIs and control device functions (e.g., turning on a phone's flashlight). This demonstrates that even small, local models can perform useful, automated tasks.

This moves beyond simple text generation, enabling developers to build applications where AI can take direct action in the digital or physical world, all while running locally on a user's device.

Get started free

Topics

Jemma Google DeepMind Open Source AI Large Language Models (LLMs)On-Device AI Edge AI Multimodal AI Agentic AI Function Calling Hybrid Inference Apache 2.0 License AI for Developers Model Fine-Tuning Raspberry Pi Android AI

Processed Apr 28, 2026 yt-dlp + mlx-whisper + Gemini