New GPT-5.4 clobbers humans on pro-level work in OpenAI's tests - by 83%

2 hours ago 7

GPT-5.4 exemplary shown connected a orangish and bluish unreality background.

Follow ZDNET: Add america arsenic a preferred source connected Google.

ZDNET's cardinal takeaways

GPT-5.4's 83% people suggests AI rivals adept professionals.
Tests span 9 industries and 44 real-world occupations.
New capabilities boost coding, tools, and machine control.

It seems similar lone yesterday that OpenAI released its GPT-5.2 model to the world. In fact, it's been little than 3 months. Thursday, OpenAI is releasing the reasoning exemplary of GPT-5.4.

Also: How to power from ChatGPT to Claude: Transferring your memories and settings is easy

What precisely does that mean? In this article, I'll concisely interaction connected the authoritative announcement and availability details, and past I'll dive into what I deliberation is the astir startling detail: GPT-5.4 tin lucifer oregon outperform quality professionals 83% of the time, according to OpenAI.

(Disclosure: Ziff Davis, ZDNET's genitor company, filed an April 2025 suit against OpenAI, alleging it infringed Ziff Davis copyrights successful grooming and operating its AI systems.)

Availability details

OpenAI says GPT-5.4 is "the astir susceptible and businesslike frontier exemplary for analyzable nonrecreational work." Within ChatGPT, the institution calls this exemplary GPT 5.4 Thinking. There are besides releases for the API, wrong the Codex programming tool, and successful a GPT-5.4 Pro version.

Also: 10 ChatGPT Codex secrets I lone learned aft 60 hours of brace programming with it

In presumption of wide performance, the institution says that GPT-5.4 is "18% little apt to incorporate errors, and idiosyncratic claims are 33% little apt to beryllium mendacious compared to GPT-5.2, based connected prompts wherever users antecedently flagged factual mistakes."

It's ever bully erstwhile an highly almighty artificial quality makes worldly up little frequently.

As for availability, the institution volition connection GPT-5.4 via API connected Friday. It volition beryllium "rolling out" crossed ChatGPT paid tiers and successful Codex, which presumably means it volition amusement up reasonably soon for astir users.

But what astir GPT-5.3?

It gives maine nary joyousness to accidental this, but OpenAI's naming conventions springiness maine a headache. When it comes to naming, it feels similar it fired each its experienced merchandise managers and replaced them with a GPT-3.5 lawsuit from 2022.

So, OK, OpenAI released GPT-5.3-Codex past month. That's the archetypal mentation of Codex that utilized itself to assistance physique itself. Skynet, anyone?

Then, 2 days ago...two days ago it released GPT-5.3 Instant. This, according to the company, "makes mundane conversations much consistently adjuvant and fluid." It's disposable to each users of ChatGPT. In the API, it's released arsenic gpt-5.3-chat-latest. Not gpt-5.3-chat-instant, due to the fact that that would marque excessively overmuch sense.

And now, we person GPT-5.4. So successful the abstraction betwixt Tuesday and Thursday, OpenAI has released a GPT-5.3 and a GPT-5.4 model. You'd person to beryllium an AI to support way of it all.

Because specified crimes against coherent versioning marque maine twitchy, I had to inquire the OpenAI communications squad astir it. They were diligent and benignant capable to answer:

GPT-5.4 is our archetypal mainline reasoning exemplary that incorporates the frontier coding capabilities of gpt-5.3-codex, and that is rolling retired crossed ChatGPT, the API, and Codex. We're calling it GPT-5.4 to bespeak that jump, and to simplify the prime betwixt models erstwhile utilizing Codex. Over time, you tin expect our Instant models and Thinking models to germinate astatine antithetic speeds.

I inactive don't similar it. If Instant and Thinking are truly 2 abstracted products, they should person wholly abstracted versioning. 5.3 and 5.4 are excessively adjacent and excessively confusing. If they're considered to beryllium antithetic variants of the aforesaid product, they should stock mentation numbers.

Also: Is ChatGPT Plus inactive worthy your $20? I compared it to the Free, Go, and Pro plans

But hey. OpenAI is worthy something connected the bid of $840 billion, and I ain a 14-year-old Ford. What bash I know? Let's determination connected to the portion wherever we each interest astir our occupation security.

Testing real-world AI quality

In September, OpenAI introduced a new AI valuation trial called GPTval. It's a trial designed to measurement however good AI models execute doing "economically valuable, real-world tasks."

The trial measures show successful 9 industries and 44 occupations. OpenAI chose the industries based connected those contributing 5% oregon much to the US gross home product. Each manufacture has unsocial occupations. For the test, the institution selected up to 5 occupations, choosing those that had little than 40% carnal oregon manual work, and which marque up those jobs with the highest full wages and astir wide compensation.

Also: I stopped utilizing ChatGPT for everything: These AI models bushed it astatine research, coding, and more

It fundamentally picked a cross-section of knowledge-related jobs wherever AI could person the astir interaction "on real-world productivity." The intent was that the GPT models could assistance professionals get much done, but it's not excessively large a leap to infer that these occupations are besides the astir astatine hazard from AI replacement oregon augmentation.

Here's however those occupations acceptable into their industries.

Finance and insurance: Customer work representatives, fiscal and concern analysts, fiscal managers, idiosyncratic fiscal advisors, securities, commodities, and fiscal services income agents
Retail trade: Pharmacists, first-line supervisors of retail income workers, wide and operations managers, backstage detectives and investigators
Wholesale trade: Sales managers, bid clerks, first-line supervisors of non-retail income workers, income representatives (wholesale and manufacturing, but method and technological products), income representatives (wholesale and manufacturing, method and technological products)
Real property and rental and leasing: Concierges, property, existent estate, and assemblage relation managers, existent property income agents, existent property brokers, antagonistic and rental clerks
Government: Recreation workers, compliance officers, first-line supervisors of constabulary and detectives, administrative services managers, child, family, and schoolhouse societal workers
Manufacturing: Mechanical engineers, concern engineers, buyers and purchasing agents, shipping, receiving, and inventory clerks, first-line supervisors of accumulation and operating workers
Professional, scientific, and method services: Software developers, lawyers, accountants and auditors, machine and accusation systems managers, task absorption specialists
Health attraction and societal assistance: Registered nurses, caregiver practitioners, aesculapian and wellness services managers, first-line supervisors of bureau and administrative enactment workers, aesculapian secretaries and administrative assistants
Information: Audio and video technicians, producers and directors, quality analysts, reporters, and journalists, movie and video editors, editors

I could get picky astir which occupations are the astir impactful successful the assorted industries, but this enactment is simply a bully 1 for investigating exemplary show overall.

The tests themselves are absorbing successful some however they are constructed and however they are measured.

OpenAI worked with experienced professionals successful each concern to make a acceptable of tasks that "reflect their day-to-day work." The task sets each went done galore rounds of adept reappraisal and resulted successful a bid of afloat reviewed, analyzable tasks per industry.

One of the manufacturing technologist tasks, for example, involves the plan of a jig (guides a tool) oregon a fixture (holds the work) to simplify the reeling successful and reeling retired of a cablegram spool for underground mining operations.

Also: This elemental ChatGPT instrumentality helps you spot scams earlier you click oregon respond

Grading for each of these tests was done by quality professionals successful each of the occupations. The graders weren't told whether the results were from the AI, oregon from different professionals successful their fields.

Additionally, OpenAI built an automated grading strategy based connected the enactment of the quality graders, truthful that the humans don't person to instrumentality their clip grading each iteration of the AI model. I'm definite OpenAI constructed this automated strategy with each due safeguards, but I interest that immoderate level of inherent bias mightiness beryllium imaginable erstwhile letting an AI people the show of an AI.

Ethan Mollick, subordinate prof and co-director of the Generative AI Lab astatine Wharton, describes the GDPval test arsenic "probably the astir economically applicable measurement of AI ability."

83% of the clip

The velocity of betterment is insane. GPT-5.1 was released successful November and had a GDPval people of 38.8%. In December, conscionable a period later, GPT-5.2 show exploded to astir treble that, to 70.9%.

Professor Mollick described the value of GDPval moving connected GPT-5.2. He said, "In head-to-head contention with quality experts connected tasks that necessitate 4-8 hours for a quality to do, GPT-5.2 wins 71% of the clip arsenic judged by different humans."

Now, successful aboriginal March, little than 3 months aft GPT-5.2, GPT-5.4 matches oregon exceeds the show of quality professionals 83% of the time!

Also: How to larn ChatGPT successful an hr - for free

In different words, astir each clip the aforesaid task was fixed to an experienced quality pro and GPT-5.4, the AI either kept up with oregon blew past the experienced quality pro, astatine least, according to its grader, which whitethorn person been quality oregon AI.

Sit with that for a fewer minutes. We're not conscionable talking astir programming tasks. We're talking astir a wide scope of industries and a wider scope of high-value occupations.

According to Daniel Swiecki, caput of Artificial Intelligence Solutions astatine Walleye Capital, "On our toughest interior concern and Excel evaluations, GPT-5.4 outperformed anterior models, improving accuracy by 30 percent points. This measurement alteration successful reliability materially expands our automation of exemplary updates and script analyses for cardinal investors."

The freaky happening is this benignant of show could instrumentality america successful 2 directions. On the 1 hand, it could assistance augment quality pros, giving experienced folks the quality to get much done, faster. On the different hand, it could good beryllium seen arsenic the harbinger of a clip erstwhile the AI is simply replacing the humans successful high-value, high-skill jobs.

The aboriginal is astir apt not going to beryllium each 1 oregon each the other. But adjacent arsenic OpenAI takes a triumph thigh for its latest release, those of america who enactment our families based connected a beingness of accomplishment gathering wrong those professions person to stone backmost connected our heels, instrumentality deep, disquieted breaths, and anticipation for the best.

Speaking personally, my attack has been to larn each I can, arsenic rapidly arsenic I can, and usage AI arsenic overmuch arsenic I can. That helps maine picture each of this to you, but it besides helps maine augment my idiosyncratic productivity utilizing AI resources, peculiarly for programming.

Also: I'm a ChatGPT powerfulness user: Here are 7 utile settings that are turned disconnected by default

But I worry. AI slop is simply a existent thing, and arsenic AI slop increases much and much successful quality, each of america volition beryllium competing with a elephantine superbrain that ne'er sleeps, ne'er eats, and is improving astatine astir supernatural speed.

More capabilities

In summation to wide performance, GPT-5.4 improves connected different halfway capabilities.

Tool use: GPT-5.4 improves however AI agents prime and usage outer tools, enabling them to implicit multi-step workflows much accurately and efficiently portion reducing token usage.
Computer vision: The caller exemplary enhances ocular understanding, allowing it to amended construe analyzable images, parse documents, and crushed astir ocular accusation with higher accuracy.
Computer usage capabilities: Within the API and Codex, GPT-5.4 introduces autochthonal computer-use abilities that fto agents interact with bundle systems done screenshots, keyboard and rodent commands, and automated workflows crossed applications.
Coding: GPT-5.4 combines the coding strengths of GPT-5.3-Codex with improved reasoning and instrumentality use, helping developers build, debug, and iterate connected analyzable bundle tasks much effectively.

Stay tuned. GPT-5.4 Thinking volition beryllium successful your ChatGPT interface shortly. Let the contention begin.

What bash you think?

What bash you deliberation astir GPT-5.4's assertion that it tin lucifer oregon outperform quality professionals 83% of the time? Does that look similar a meaningful benchmark for real-world work?

Also: The champion AI chatbots of 2026: Expert tested and reviewed

Have you started integrating AI into your ain nonrecreational workflow? If so, wherever does it assistance the astir oregon autumn short? Looking ahead, bash you spot tools similar this mostly augmenting quality expertise, oregon yet replacing parts of it?

Share your thoughts and experiences successful the comments below.

You tin travel my day-to-day task updates connected societal media. Be definite to subscribe to my play update newsletter, and travel maine connected Twitter/X astatine @DavidGewirtz, connected Facebook astatine Facebook.com/DavidGewirtz, connected Instagram astatine Instagram.com/DavidGewirtz, connected Bluesky astatine @DavidGewirtz.com, and connected YouTube astatine YouTube.com/DavidGewirtzTV.

Read Entire Article