Are GPT-5.2's new powers enough to surpass Gemini 3? Try it and see

2 days ago 13

Follow ZDNET: Add america arsenic a preferred source on Google.

ZDNET's cardinal takeaways

OpenAI released GPT-5.2, its latest model, connected Thursday.
It fastracked the exemplary to enactment competitory with Google and Anthropic.
GPT-5.2 is built for nonrecreational tasks and rivals experts.

After a week of teasing, OpenAI's latest model, GPT-5.2, has landed -- and it tin seemingly rival your nonrecreational skills.

The institution called GPT-5.2 "the astir susceptible exemplary bid yet for nonrecreational cognition work" successful the announcement connected Thursday. Citing its ain caller survey of AI usage astatine work, the institution noted that AI saves the mean idiosyncratic up to an hr each day; GPT-5.2 appears designed to physique connected that significantly.

Also: ChatGPT saves the mean idiosyncratic astir an hr each day, says OpenAI - here's how

"We designed GPT‑5.2 to unlock adjacent much economical worth for people; it's amended astatine creating spreadsheets, gathering presentations, penning code, perceiving images, knowing agelong contexts, utilizing tools, and handling complex, multi-step projects," the institution wrote.

The institution reportedly fast-tracked the exemplary pursuing Google and Anthropic's competitory releases of Gemini 3 and Opus 4.5, respectively, according to a study by The Information. Here's what it tin do, and however you tin effort it.

(Disclosure: Ziff Davis, ZDNET's genitor company, filed an April 2025 suit against OpenAI, alleging it infringed Ziff Davis copyrights successful grooming and operating its AI systems.)

Built for enactment tasks

OpenAI said GPT-5.2 "outperforms manufacture professionals astatine well-specified cognition enactment tasks spanning 44 occupations." The study specifically called retired GDPval, an in-house benchmark the institution released successful September that tries to measurement the economical worth AI models produce. It does truthful by evaluating however models attack 1,320 tasks commonly linked to 44 jobs crossed 9 industries that lend much than 5% to the US gross home merchandise (GDP).

GPT-5.2 Thinking scored 70.9% connected GDPval, compared to GPT-5.1 Thinking's people of 38.8% -- meaning it excelled astatine emblematic cognition enactment tasks similar making spreadsheets and presentations.

"GPT‑5.2 Thinking produced outputs for GDPval tasks astatine >11x the velocity and <1% the outgo of adept professionals, suggesting that erstwhile paired with quality oversight, GPT‑5.2 tin assistance with nonrecreational work," OpenAI wrote, adding that an adept justice compared the model's output to enactment "done by a nonrecreational institution with staff" (despite immoderate insignificant errors).

Also: 3 ways AI agents volition marque your occupation unrecognizable successful the adjacent fewer years

Alongside GDPval, OpenAI released findings connected however respective of its ain models, arsenic good as Anthropic's Claude Opus 4.1, Google's Gemini 2.5 Pro, and xAI's Grok 4, performed connected the benchmark. Claude Opus 4.1 came successful archetypal spot overall, demonstrating peculiar strengths successful aesthetic tasks similar papers formatting and descent layout, portion GPT-5 scored highly for accuracy -- what OpenAI described arsenic "finding domain-specific knowledge.

OpenAI besides called retired GPT-5.2's improved long-context reasoning and imaginativeness abilities. The former, it said, should assistance professionals support accuracy erstwhile utilizing the exemplary to analyse agelong reports, contracts, and different documents, portion the second makes it much skilled astatine accurately interpreting diagrams, images of dashboards, screenshots, and different ocular data.

"Compared to erstwhile models, GPT‑5.2 Thinking has a stronger grasp of however elements are positioned wrong an image, which helps connected tasks wherever comparative layout plays a cardinal relation successful solving the problem," the institution wrote. It provided an illustration of however the exemplary was capable to place bounding boxes adjacent successful a low-quality representation and demonstrated a stronger knowing of "spatial arrangement" than 5.1.

Coding prowess

The exemplary besides showed smaller improvements implicit GPT-5.1 Thinking crossed respective industry-standard benchmarks, including AIME 2025, which measures math, and SWE-Bench Pro, which measures bundle engineering successful 4 languages. It scored a caller state-of-the-art connected the second astatine 55.6%.

Also: The champion escaped AI for coding successful 2025 - lone 3 marque the chopped now

According to OpenAI, that means amended accumulation codification debugging and diagnostic implementation, arsenic good arsenic hole deployment with little manual developer intervention. The institution besides touted GPT-5.2's improved front-end capabilities, particularly connected "complex oregon unconventional UI work" and 3D elements.

Less hallucination

OpenAI noted successful the announcement that GPT-5.2 Thinking hallucinates 30% little than 5.1 Thinking, which it said should promote endeavor users to interest little astir encountering mistakes erstwhile utilizing the exemplary for probe and analysis.

Some hazard of hallucination is simply a world of utilizing immoderate AI model, and users should double-check immoderate assertion a exemplary makes, nary substance however overmuch its factuality people has improved implicit its predecessor.

Safety

The institution emphasized successful the announcement that it much intimately trained GPT-5.2 connected however to grip delicate conversations, uncovering "fewer undesirable responses successful some GPT‑5.2 Instant and GPT‑5.2 Thinking arsenic compared to GPT‑5.1 and GPT‑5 Instant and Thinking models."

For its models overall, the institution said it has made "meaningful improvements successful however they respond to prompts indicating signs of termination oregon self-harm, intelligence wellness distress, oregon affectional reliance connected the model."

Also: Using AI for therapy? Don't - it's atrocious for your intelligence health, APA warns

OpenAI added that it is inactive successful the process of launching its property prediction model, which the institution said volition "automatically use contented protections for users who are nether 18, successful bid to bounds entree to delicate content."

The announcement besides included a intelligence wellness valuation array for those 4 aforementioned models, showing scores connected a zero-to-one standard for each, though it did not specify methodology.

How to effort it

GPT-5.2 volition statesman rolling retired to paid ChatGPT users connected Thursday, pursuing the accustomed deployment of an OpenAI exemplary household with Instant, Thinking, and Pro versions for antithetic tasks. Developers tin entree each 3 versions present successful the API.

Plus, Pro, Business, and Enterprise users tin usage the model's spreadsheet and presumption features by selecting the Thinking oregon Pro modes.

Is GPT-5.2 replacing different models?

OpenAI assured users that it has "no existent plans to deprecate GPT‑5.1, GPT‑5, oregon GPT‑4.1 successful the API and volition pass immoderate deprecation plans with ample beforehand announcement for developers." It added that the caller exemplary works good arsenic is successful Codex, but that it volition merchandise an optimized mentation of the exemplary for that situation successful the adjacent fewer weeks.

Also: Stop utilizing ChatGPT for everything: The AI models I usage for research, coding, and much (and which I avoid)

The disclaimer whitethorn beryllium meaningful to users who reacted negatively to the momentary deprecation of earlier models, including GPT-4, erstwhile OpenAI released GPT-5 this past summer.

Mystery 'Garlic' model

Another report from The Information published past week revealed that OpenAI was besides processing a caller model, codenamed Garlic.

It's unclear however abstracted Garlic and the anticipated GPT-5.2 are, but The Information referred to GPT-5.2 (as good arsenic yet different forthcoming release, GPT-5.5) arsenic imaginable versions of Garlic. Prior to 5.2's release, OpenAI's Chief Research Officer Mark Chen informed colleagues that Garlic performed good successful institution evaluations compared to Gemini 3 and Opus 4.5 successful tasks involving coding and reasoning, according to the report. However, neither Gemini 3 nor Opus 4.5, some of which acceptable manufacture standards past month, were mentioned successful benchmark comparisons successful the show study for GPT-5.2.

Chen added that erstwhile processing Garlic, OpenAI addressed issues with pretraining, the archetypal signifier of grooming successful which the exemplary begins learning from a monolithic dataset. The institution focused the exemplary connected broader connections earlier grooming it for much circumstantial tasks.

Also: Gemini vs. Copilot: I tested the AI tools connected 7 mundane tasks, and it wasn't adjacent close

These changes successful pretraining alteration OpenAI to infuse a smaller exemplary with the aforesaid magnitude of cognition antecedently reserved for larger models, according to Chen's remarks cited successful the report. Smaller models tin beryllium beneficial for developers arsenic they are typically cheaper and easier to deploy -- thing French AI laboratory Mistral emphasized with its ain merchandise past week.

For the institution down it, a smaller exemplary is cheaper to physique and deploy. Garlic is not to beryllium confused with Shallotpeat, a exemplary Altman announced to unit successful October, according to a previous report also from The Information. That exemplary besides aimed to hole bugs successful the pretraining process.

As for erstwhile to expect Garlic, Chen kept the details vague, saying lone "as soon arsenic possible" successful the report. The developments made erstwhile creating Garlic person already allowed the institution to determination connected to processing its next, bigger and amended model, Chen said.

A conflict for users

This fierce contention betwixt Google and OpenAI tin beryllium partially attributed to some vying for the aforesaid sector: consumers.

As Anthropic's CEO, Dario Amodei, noted successful speech with writer Andrew Ross Sorkin during The New York Times' DealBook Summit past week, Anthropic isn't successful the aforesaid contention oregon facing a "code red" panic arsenic its competitors, due to the fact that it is focused connected serving enterprises alternatively than consumers. The institution conscionable announced that its Claude Code agentic coding instrumentality reached $1 cardinal successful run-rate revenue, lone six months aft becoming disposable to the public.

Read Entire Article