AI Model Release Tracker: Microsoft AI's first reasoning model arrives

3 hours ago 4

Follow ZDNET: Add america arsenic a preferred source on Google.

AI labs are shipping caller models nonstop. Besides being amended and faster than their predecessors, each caller exemplary isn't guaranteed to beryllium a large measurement change, contempt however the company's PR whitethorn wax poetic astir them. Model strengths truly look successful context: Where are rival models lacking oregon excelling? Which models person outstanding specialties, and which are conscionable catching up to manufacture standards?

Also: How we trial AI astatine ZDNET

Our Model Release Tracker helps you marque consciousness of wherever models basal comparative to each other, and whether they're worthy a deeper look. While we don't trial each exemplary oregon exemplary update connected this list, we'll ever see the cardinal elements you request to know, on with our hands-on adept test, wherever applicable. We besides see an Expert Score for definite models. Curious astir however we trial AI? Check out this breakdown of our process.

Here are immoderate of the biggest exemplary releases of 2026 truthful acold and what to cognize astir them. We'll update this database whenever a notable caller exemplary arrives.

MAI-Thinking-1

Microsoft AI | June 2, 2026

What it does: At its Build developer conference, Microsoft said that this caller 35-billion-parameter exemplary is, unsurprisingly, designed for multi-step agentic tasks. It scored likewise connected the SWE Bench Pro benchmark trial for coding arsenic Anthropic Opus 4.6. The institution besides noted that endeavor users tin spot this exemplary for immoderate usage due to the fact that it was trained lone connected clean, commercially harmless information -- an important tidbit fixed mounting AI copyright lawsuits.

Also: Microsoft's archetypal reasoning exemplary is 1 of 7 AIs conscionable released astatine Build - what we cognize truthful far

Why it matters: This is the archetypal reasoning exemplary from Microsoft AI, a notable milestone for immoderate AI lab, but particularly truthful this precocious successful this race. Most labs person created aggregate precocious reasoning models astatine this point, truthful it's unclear yet wherever Microsoft's attack stands comparative to those, though the institution is focusing connected appealing to endeavor clients specifically.

Claude Opus 4.8

Anthropic | May 28, 2026

What it does: Replacing Opus 4.7 starting May 28 (at the aforesaid price), Opus 4.8 offers faster reasoning modes for one-third the outgo of the earlier version, according to Anthropic. Like astir of Anthropic's models, 4.8 prioritizes coding abilities, scoring higher than 4.7 connected 2 coding benchmarks but not afloat besting OpenAI's GPT 5.5. It besides "reaches caller highs connected our measures of prosocial traits similar supporting idiosyncratic autonomy and acting successful the user's champion interest," the institution noted successful the release, though definitions for what that means stay murky.

Also: Anthropic launches Opus 4.8, with honesty arsenic its slayer feature

Why it matters: Anthropic has ever prioritized exemplary information and interpretability, but appears to beryllium further emphasizing that modular with this release. The institution said Opus 4.7 had a 92% honesty rate, successful summation to being little sycophantic and hallucination-prone overall. The information that it claims 4.8 shows "substantially" little rates of misalignment than 4.7 indicates an progressively precocious modular for exemplary safety, particularly due to the fact that Anthropic compared 4.8's alignment to that of Mythos Preview.

Gemini 3.5 Flash

Google | May 19, 2026

What it does: At Google I/O, the institution launched its 3.5 exemplary family, designed for gathering agents (following suit with wherever each of the AI manufacture is focused astatine the moment). The archetypal exemplary disposable present is Gemini 3.5 Flash, which bushed Gemini 3.1 Pro connected a fewer coding and agentic benchmarks and present undergirds AI Mode successful Search and the Gemini App arsenic the default model. Like each Flash models, it's optimized for velocity and a cheaper, much lightweight idiosyncratic experience, but tin inactive grip "long-horizon" agentic tasks, Google noted. 3.5 Pro should rotation retired successful June.

Also: Anthropic launches Opus 4.8, with honesty arsenic its slayer feature

Why it matters: The astir important spot 3.5 volition run is successful Search. As Google doubles down connected an AI-first hunt experience, the models moving it signifier the responses radical volition passively devour each day, meaning it has to beryllium particularly up to snuff successful presumption of accuracy and hallucinations. Notably, though, the System Card for 3.5 Flash lacks immoderate notation of hallucination complaint oregon sycophancy.

GPT-5.5 Instant

OpenAI | May 5, 2026

What it does: OpenAI said in its announcement that the lighter mentation of OpenAI's just-released GPT-5.5 is little verbose than its predecessor, GPT-5.3 Instant. It besides touted less hallucinations and improved factuality, saying "GPT‑5.5 Instant produced 52.5% less hallucinated claims than GPT‑5.3 Instant connected high-stakes prompts covering areas similar medicine, law, and finance."

Also: Anthropic's Mythos is evolving faster than expected, reports AI information agency

Why it matters: GPT-5.5 Instant replaces GPT-5.3 arsenic the default exemplary successful ChatGPT. Again, portion the anticipation is that each caller AI exemplary gets much efficient, easier to use, and makes up little stuff, a important betterment successful hallucinations for a exemplary astir radical usage for accelerated queries could mean little misinformation spreading among the masses. That's particularly captious fixed however galore radical are utilizing ChatGPT for mundane wellness questions, for example.

(Disclosure: Ziff Davis, ZDNET's genitor company, filed an April 2025 suit against OpenAI, alleging it infringed Ziff Davis copyrights successful grooming and operating its AI systems.)

Nemotron 3 Nano Omni

Nvidia | April 28, 2026

What it does: The latest successful Nvidia's unfastened Nemotron family, this exemplary provides agents with multimodal input. That means they tin "perceive and crushed crossed visual, audio, and textual inputs wrong a azygous shared perception‑to‑action loop," according to Nvidia, thereby unifying aggregate capabilities into a azygous system.

Also: AI is an arms race, and the US wants $9 cardinal successful Nvidia superchips to support up

Why it matters: Normally, systems of agents request to usage abstracted models for speech, vision, and text, meaning they leap crossed documents, video, and audio to implicit multi-step tasks. That slows down workflows, undermines the discourse agents gather, and racks up inference costs. Nvidia's approach, if it works, would streamline this process and trim token use, redeeming you money. Try it connected Hugging Face.

GPT-5.5

OpenAI | April 23, 2026

Expert Score: 93/100

What it does: ZDNET tester-in-residence David Gewirtz technically gave GPT-5.5 an A- score, but said it "can beryllium reductively described arsenic amended and faster than GPT-5.4," which is hopefully the bare-minimum anticipation for a caller model. Specifically, though, the exemplary got amended astatine agentic coding, intelligibly identifying concepts, technological research, and factual accuracy.

Also: I enactment GPT-5.5 done a 10-round test: It scored 93/100, losing points lone for exuberance

Why it matters: While the exemplary itself whitethorn not beryllium leaps and bounds up of its contiguous predecessor, the speedy turnaround from 5.4 to 5.4 -- little than 2 months -- indicates however rapidly agentic coding is accelerating OpenAI's exemplary merchandise cycle. As ZDNET's David Gewirtz breaks down, the company, overmuch similar different frontier labs utilizing AI to physique AI, is shipping updates astatine an exponentially expanding rate.

ChatGPT Images 2

OpenAI | April 23, 2026

What it does: Soon aft sunsetting Sora, its generative video exemplary and societal platform, OpenAI somewhat confusingly announced Images 2. ZDNET exemplary tester David Gewirtz got an aboriginal look astatine Images 2 before its merchandise and was impressed. While helium didn't springiness this exemplary a ceremonial Expert Score, helium said it's fun, a immense leap, and actually utile for work.

Why it matters: OpenAI seemed to beryllium getting retired of the much consumer-minded AI merchandise crippled erstwhile it discontinued Sora, having been beaten by Anthropic astatine securing lucrative endeavor contracts. That OpenAI inactive came retired with Images 2 wrong that redirection communicative indicates that it sees representation generators arsenic applicable capable to endeavor AI -- particularly connected the heels of Anthropic's Claude Design.

Claude Opus 4.7

Anthropic | April 16, 2026

What it does: Arriving relatively quickly aft Opus 4.6, this exemplary boasts caller highs successful honesty, reduced sycophancy, and hallucinations. It besides appears to person a knack for cybersecurity, arsenic it backs the caller Claude Security, released soon aft the exemplary itself -- but no, it's not Mythos, arsenic galore suspected.

Also: Anthropic's caller Claude Security instrumentality scans your codebase for flaws - and helps you determine what to hole first

Why it matters: Hallucinations and honesty are among the astir difficult, hard-to-solve issues plaguing adjacent the champion models. For Anthropic to assertion specified important gains successful those areas is nary tiny feat for an AI laboratory that takes information seriously.

Claude Mythos (Preview)

Anthropic | April 7, 2026

What it does: This is simply a pugnacious 1 due to the fact that Mythos isn't really disposable to the public. Anthropic created rather a media tempest erstwhile it positioned the caller general-purpose exemplary arsenic excessively almighty to merchandise arsenic usual. While the exemplary is seemingly a measurement alteration from earlier Anthropic models, the institution was particularly alarmed due to the fact that of the information menace it posed, stating that "it is strikingly susceptible astatine machine information tasks."

In effect to that, Anthropic spearheaded Project Glasswing, a collaborative effort with respective rival AI labs, including Google, Nvidia, and Microsoft, arsenic good arsenic information authorities similar Palo Alto Networks, "to assistance unafraid the world's astir captious software, and to hole the manufacture for the practices we each volition request to follow to support up of cyberattackers."

Also: Apple, Google, and Microsoft articulation Anthropic's Project Glasswing to support world's astir captious software

Why it matters: If we're to judge Anthropic's guidance that Mythos poses a important menace to the world's bundle -- truthful overmuch truthful that lone a prime fewer partners tin entree it -- cybersecurity apparatuses arsenic they basal whitethorn not beryllium prepared to conscionable the rapidly evolving frontier of exemplary capabilities. Mythos whitethorn not beryllium the lone exemplary of its caliber, but simply the archetypal of galore to travel erstwhile different labs execute akin breakthroughs.

For now, conscionable a fewer weeks into its release, Mythos is helping drawback bundle bugs successful droves.

GPT-5.4

OpenAI | March 5, 2026

What it does: OpenAI framed this caller model, released hardly 3 months aft GPT-5.2, arsenic specifically designed for nonrecreational work. According to the company's ain investigating (which should ever beryllium taken with a atom of brackish until verified by a 3rd party), GPT-5.4 matches oregon outperforms quality professionals 83% of the time.

Why it matters: As AI companies absorption much connected gaining endeavor spot (and contracts) portion lauding what agentic AI tin do, they request models that tin grip analyzable work-related tasks with minimal risk, delay, oregon prohibitively precocious costs. Any exemplary advancement that shows prowess successful nonrecreational workflows has a amended accidental of being taken earnestly by companies struggling to follow AI, though thing guarantees seamless integration.

Also: OpenAI's caller GPT-5.4 clobbers humans connected pro-level enactment successful tests - by 83%

Claude Opus 4.6

Anthropic | Feb. 5, 2026

What it does: This exemplary rapidly redefined the modular for autonomous agentic work, particularly for coding. That's nary astonishment fixed Anthropic's authorization successful gathering models particularly adept astatine programming tasks. Opus 4.6 besides demonstrated betterment successful complex, longer-running tasks overall.

Why it matters: Opus 4.6's quality to grip tasks amended connected its ain means you tin reliably offload much of your workflow to it -- thing agentic offerings usually conflict with.

Also: Anthropic says its caller Claude Opus 4.6 tin nail your enactment deliverables connected the archetypal try

GPT-5.3-Codex

OpenAI | Feb. 5, 2026

What it does: This caller coding exemplary -- which OpenAI said helped physique and debug itself -- tin beryllium interrupted and redirected mid-task, which, if true, is simply a immense boon for developers utilizing it connected analyzable oregon shifting projects with tons of trial-and-error. GPT-5.3-Codex besides boasts tally times of implicit a time and a amended grasp connected idiosyncratic intent.

Also: OpenAI's caller Spark exemplary codes 15x faster than GPT-5.3-Codex - but there's a catch

Why it matters: OpenAI is trying to drawback up to Anthropic's pb successful agentic coding (and, coincidentally oregon not, released 5.3 Codex connected the aforesaid time arsenic Anthropic launched Opus 4.6). While ZDNET experts often similar Claude Code to different tools for vibe coding, OpenAI's rumored displacement toward endeavor clients and distant from amusive user tools could yet adjacent that gap.

Read Entire Article