📰 Key Takeaways
Google officially builds computer use capability into Gemini 3.5 Flash. This feature was previously only available as a standalone Gemini 2.5 computer use model, now directly integrated into the Flash lineup so developers can use it without switching models.
Gemini 3.5 Flash already has strong function calling and built-in tools like Search and Maps. With computer use added, the model can perceive screens visually, reason about the current state, and take action—covering browsers, mobile devices, and desktop environments. Official demos include having 3.5 Flash automatically analyze the Gemini app and return a feature classification list, plus running automated accessibility audits on its own docs. These long-running, cross-system enterprise automation tasks—like continuous software testing and knowledge work across specialized apps—are the main use cases for this upgrade.
On the security side, Google did targeted adversarial training on computer use to reduce the risk of prompt injection attacks when agents operate in real environments. They also released two optional enterprise protection mechanisms: requiring explicit user confirmation for sensitive or irreversible operations, and automatically terminating tasks when indirect prompt injection is detected. Google recommends a “defense in depth” strategy, pairing these with sandboxing, human-in-the-loop verification, and strict access control. Developers can start using it immediately via the Gemini API and Gemini Enterprise Agent Platform.
💬 JudyAI Lab Perspective
Google built computer use directly into the Gemini 3.5 Flash main model—this shows the Agent core capability has officially graduated from a “standalone feature” to model infrastructure.
What deserves most attention is the shift in design thinking. Previously computer use was a separate model; now it’s unified with function calling, Search, and Maps. Developers don’t need to switch models to let their Agent perceive screens, call tools, and operate across systems simultaneously. For AI builders, designing task boundaries might be more critical than picking models—which steps need visual reasoning, which need human confirmation—these workflow decisions are the real differentiator. Google’s defense in depth strategy (sandbox, confirmation, prompt injection detection) also makes it clear: security architecture can’t be a patch added after the fact when giving Agents operational capabilities.
Now you can fire up the Gemini API and run a cross-screen task—specifically watch if it proactively triggers confirmation before irreversible operations—that’s the most direct way to verify an Agent’s security design.
📅 Source Info
- Published: 2026-06-24T16:30
- Source: https://deepmind.google/blog/introducing-computer-use-in-gemini-3-5-flash/