Four Models, One Strategy: Bring AI On-Device
Google didn't make a big production out of it. No stage, no keynote — just a release post on April 2 and four new models dropped into the wild under an Apache 2.0 license [3]. That last part matters more than the announcement format. Apache 2.0 means developers can use the weights commercially, modify them, and build products without asking permission. The Gemma 4 family includes four tiers aimed at different hardware targets. The 2B model is for mobile and edge devices. The 4B model stays efficient enough for phones and tablets. The 26B is a mixture-of-experts model that activates only a fraction of its parameters during inference. The 31B is the dense flagship. All of them share long context, multimodal input support, and a local-first story that pushes against the idea that useful AI must live in the cloud [1][3].
What It Actually Does When You Test It
Benchmarks are fine, but the more useful question is what happens when you hand the model a real task and get out of the way. The hands-on YouTube reviews did exactly that [1][2]. Reviewers used Gemma 4 to build front-end layouts, generate a browser-based macOS-style interface, sketch product viewers, simulate desktop environments, and handle agentic workflows that require more than one coherent step. The 31B results looked legitimately strong. The output wasn't perfect, but it was the kind of output a developer can actually work with. The 26B model, which is much more realistic for local hardware, was rougher but still useful in the right way: enough structure, enough correctness, and enough speed to become part of a workflow instead of staying a toy.

