Google unveils Gemini 2.5 upgrades for reasoning & security

Thu, 22nd May 2025

Google has provided a series of updates to its Gemini 2.5 model series, with enhancements spanning advanced reasoning, developer capabilities and security safeguards.

The company reported that Gemini 2.5 Pro is now the leading model on the WebDev Arena coding leaderboard, holding an ELO score of 1415. It also leads across all leaderboards in LMArena, a platform that measures human preferences in multiple dimensions. Additionally, Gemini 2.5 Pro's 1 million-token context window was highlighted as supporting strong long context and video understanding performance.

Integration with LearnLM, a family of models developed with educational experts, resulted in Gemini 2.5 Pro apparently becoming the foremost model for learning. According to Google, in direct comparisons focusing on pedagogy and effectiveness, Gemini 2.5 Pro was favoured by educators and experts over other models in a wide range of scenarios. The model outperformed others based on the five principles of learning science used in AI system design for education.

Gemini 2.5 Pro introduced an experimental capability called Deep Think, which is being tested to enable enhanced reasoning by allowing the model to consider multiple hypotheses before responding. The company said, "2.5 Pro Deep Think gets an impressive score on 2025 USAMO, currently one of the hardest math benchmarks. It also leads on LiveCodeBench, a difficult benchmark for competition-level coding, and scores 84.0% on MMMU, which tests multimodal reasoning."

Safety and evaluation measures are being emphasised with Deep Think. "Because we're defining the frontier with 2.5 Pro DeepThink, we're taking extra time to conduct more frontier safety evaluations and get further input from safety experts. As part of that, we're going to make it available to trusted testers via the Gemini API to get their feedback before making it widely available," the company reported.

Google announced improvements to 2.5 Flash, describing it as the most efficient in the series, tailored for speed and cost efficiency. This version now reportedly uses 20-30% fewer tokens in evaluations and delivers improved performance across benchmarks for reasoning, multimodality, code, and long-context tasks. The updated 2.5 Flash is now available for preview in Google AI Studio, Vertex AI, and the Gemini app.

New features have also been added to the Gemini 2.5 series. The Live API now offers a preview version supporting audio-visual input and native audio output. This is designed to create more natural and expressive conversational experiences. According to Google, "It also allows the user to steer its tone, accent and style of speaking. For example, you can tell the model to use a dramatic voice when telling a story. And it supports tool use, to be able to search on your behalf."

Early features in this update include Affective Dialogue, where the model can detect and respond to emotions in a user's voice; Proactive Audio, which enables the model to ignore background conversations and determine when to respond; and enhanced reasoning in live API use. Multi-speaker support has also been introduced for text-to-speech capabilities, allowing audio generation with two distinct voices and support for over 24 languages, including seamless transitions between them.

Project Mariner's computer use capabilities are being integrated into the Gemini API and Vertex AI, with multiple enterprises testing the tool. Google stated, "Companies like Automation Anywhere, UiPath, Browserbase, Autotab, The Interaction Company and Cartwheel are exploring its potential, and we're excited to roll it out more broadly for developers to experiment with this summer."

On the security front, Gemini 2.5 includes advanced safeguards against indirect prompt injections, which involve malicious instructions embedded into retrieved data. According to disclosures, "Our new security approach helped significantly increase Gemini's protection rate against indirect prompt injection attacks during tool use, making Gemini 2.5 our most secure model family to date."

Google is introducing new developer tools with thought summaries in the Gemini API and Vertex AI. These summaries convert the model's raw processing into structured formats with headers and action notes. Google stated, "We hope that with a more structured, streamlined format on the model's thinking process, developers and users will find the interactions with Gemini models easier to understand and debug."

Additional features include thinking budgets for 2.5 Pro, allowing developers to control the model's computation resources to balance quality and speed. This can also completely disable the model's advanced reasoning capability if desired. Model Context Protocol (MCP) support has been added for SDK integration, aiming to enable easier development of agentic applications using both open-source and hosted tools.

Google affirmed its intention to sustain research and development efforts as the Gemini 2.5 series evolves, stating, "We're always innovating on new approaches to improve our models and our developer experience, including making them more efficient and performant, and continuing to respond to developer feedback, so please keep it coming! We also continue to double down on the breadth and depth of our fundamental research — pushing the frontiers of Gemini's capabilities. More to come soon."

Share on: