Google reveals Gemini 2.5 Flash, its 'most cost-efficient thinking model'

3 days ago 15
brain-tunnel-gettyimages-1725500027
Yuichiro Chino/Getty Images

Just weeks aft unveiling Gemini 2.5 Pro, Google is connected to its adjacent top-performing model. 

On Thursday, the institution released an "early version" of Gemini 2.5 Flash successful preview successful the Gemini API, AI Studio, and Vertex AI. The exemplary has a cognition cutoff of January 2025. It tin instrumentality text, images, video, and audio prompts, and has a one-million-token discourse window. 

Also: Gemini Pro 2.5 is simply a stunningly susceptible coding adjunct - and a large menace to ChatGPT

Google says the caller mentation expands connected Flash 2.0 with improved reasoning, but "without compromising its renowned velocity oregon cost." Reasoning models walk much clip "thinking" -- oregon interpreting a query -- earlier responding, which results successful much thorough and nonstop output that, ideally, aligns amended with a user's needs, compared to earlier models that prioritize speed. Models that crushed are besides amended equipped to accurately present connected multi-step problems oregon tasks. 

"Gemini 2.5 Flash performs powerfully connected Hard Prompts successful ChatBot Arena, 2nd lone to 2.5 Pro," Google notes successful the announcement. 

Referring to the caller exemplary arsenic its astir cost-efficient, Google notes that 2.5 Flash "allows developers to configure the magnitude of reasoning it does to maximize performance." This gives developers a "thinking budget," oregon the powerfulness to wage for reasoning lone erstwhile they request it most. With reasoning on, the output terms jumps from 60 cents per 1 cardinal tokens to $3.50. 

screenshot-2025-04-17-at-11-54-19am.png
Screenshot by Radhika Rajkumar/ZDNET

If developers don't springiness the exemplary a budget, it determines the query's reasoning needs itself by evaluating the petition for complexity. For example, it volition place prompts with minimal reasoning needs -- similar "How galore states are determination successful the US?" -- separately from multi-step mathematics problems. Google notes that to replicate Flash 2.0 latency and cost, developers should acceptable the fund to 0. 

Also: How to effort Google's Veo 2 AI video generator - and what you tin bash with it

Gemini 2.5 Flash scored 12% on Humanity's Last Exam (HLE), a new, alternate benchmark to manufacture tests that person go excessively casual for rapidly evolving models. This people outperformed rival models, including Claude 3.7 Sonnet and DeepSeek R1, but not OpenAI's just-launched o4-mini, which came successful astatine 14% connected the test. 

You tin effort Gemini 2.5 Flash successful preview done the Gemini API successful Google AI Studio and Vertex AI. 

Want much stories astir AI? Sign up for Innovation, our play newsletter.

Editorial standards
Read Entire Article