I tested GPT-5.4 Thinking, and it gave me great answers (until I dove deeper)

4 days ago 12

I tested GPT-5.4 Thinking, and it gave maine large answers (until I dove deeper)

Follow ZDNET: Add america arsenic a preferred source connected Google.

ZDNET's cardinal takeaways

GPT-5.4 Thinking delivers deeper investigation than earlier ChatGPT models.
It has beardown reasoning, but it sometimes answers questions you didn't ask.
Formatting and representation procreation lag down the substance quality.

It's a caller month, and a caller AI mentation number. It's called GPT-5.4 Thinking. This latest release, which OpenAI issued past week, isn't your run-of-the-mill ChatGPT incremental update.

Also: OpenAI's caller GPT-5.4 clobbers humans connected pro-level enactment successful tests - by 83%

Oh, no. Instead of jumping from 5.2 to 5.3, for this merchandise the institution jumped each the mode to 5.4. And alternatively of offering a wide intent release, the institution released GPT-5.4 Thinking, a much cognitively prepared exemplary designed for bigger thoughts and challenges.

GPT-5.4 Thinking is disposable for the programming Codex tool, the API, and for paid ChatGPT plans. For this article, I utilized the $20-per-month ChatGPT Plus program to enactment it done its paces.

That presented maine with a spot of a challenge. Normally, erstwhile I trial a ChatGPT version, I tally it done a bid of mixed tests. Some are quick, and immoderate are a spot much detailed. The prompts are usually conscionable a fewer lines long. The responses usually lend themselves to being included successful an article.

But this Thinking exemplary required deeper dives, with much broad challenges. As such, not lone are the prompts much involved, but the responses are acold excessively extended to see successful the article. Instead, I'm providing links into each trial session. When you travel the links, you'll beryllium capable to spot the full effect successful depth. Usually, a shared transcript opens astatine the extremity of the transcript, truthful scroll backmost to the apical to get the afloat contents of that discussion.

Also: How to power from ChatGPT to Claude: Transferring your memories and settings is easy

Before we leap into the 4 challenges I presented to GPT-5.4 Thinking, I'll springiness you a speedy TL;DR decision astir my experience. There's immoderate bully and bad, but mostly good.

The good: Text-based responses are truly good. Most of the challenges I gave it were answered thoughtfully. I didn't drawback it successful immoderate hallucinations. I got constructive worth from each answer.
The bad: Unfortunately, sometimes it answered questions that differed from what I asked. Images and formatting near overmuch to beryllium desired. When it came to representation generation, intelligibly the AI did not usage an precocious model. You'll spot what I mean, but fundamentally it's similar the exemplary conscionable didn't listen. Formatting was weird. It likes precise agelong numbered lists. You tin spot them successful the chat transcripts.

Overall, I would decidedly usage the GPT-5.4 Thinking exemplary for bigger challenges and questions. I was beauteous impressed, though I decidedly wasn't a instrumentality of the formatting. It besides needs continuous absorption to support it connected track.

Now, let's dive into each of the tests.

Test 1: Aircraft bearer successful the entity

I started disconnected with an representation procreation challenge. The starting punctual was "Create an representation of an craft bearer flying successful the sky, held up by 4 upward-facing turbo-propellors successful circular instrumentality housings, carrying a squadron of combatant jets connected its deck."

Also: I stopped utilizing ChatGPT for everything: These AI models bushed it astatine research, coding, and more

I started with this due to the fact that erstwhile representation procreation tests, crossed a fig of AIs, didn't get it right. They astir ever look the propellors to the rear of the carrier. Gemini Nano Banana 2 oddly enactment the propellors successful front, with the bearer moving into the forward-facing thrust. Sometimes, we conscionable don't privation to know.

In immoderate case, close retired of the gate, with the exemplary acceptable to GPT-5.4 Thinking, ChatGPT returned this image.

As you tin see, it has the aforesaid problem. Although if you look intimately astatine it, the props look the backmost of the aircraft, and determination are ocular thrust beams shooting downward. You triumph some. You suffer some.

But then, I had a thought. This is the reasoning model, truthful what if I asked it to plan a helicarrier? What would it travel up with? I specified the characteristics of the craft, and past added connected these instructions: "Design specified a vehicle, peculiarly explaining its operation and however it volition beryllium held aloft, on with immoderate constraints oregon issues, arsenic good arsenic immoderate tactical advantages"

I got backmost a long, well-considered answer. I peculiarly liked the conception wherever it explained wherefore "four downward-facing turbo-propellers are a anemic solution." It said they look dramatic, but it outlined a bid of coagulated engineering reasons wherefore they're a atrocious thought from an craft operation constituent of view.

Also: ChatGPT's cheapest subscription comes to the US: I compared Go to Plus and Pro

It besides went connected to sermon formation platform operations and assorted constraints successful presumption of practicality. In particular, it decently focused connected the weight-to-power issue, which fundamentally means it'll instrumentality mode excessively overmuch powerfulness to clasp thing that large and dense aloft.

Overall, the investigation and conclusions were great, though I was disappointed it didn't notation either the USS Akron oregon USS Macon, which were aboriginal 20th period aircraft-launching dirigibles that really worked (until they crashed). A modern dirigible would beryllium a valid plan option, yet GPT-5.4 Thinking didn't notation that approach.

After GPT-5.4 Thinking created the elaborate plan spec, I again prompted for an image. I said, "Draw maine a representation of the astir probable plan based connected your existing analysis."

And, wouldn't you cognize it? The AI gave maine backmost the nonstop aforesaid representation arsenic the 1 I got earlier it did immoderate plan work. That's what I meant erstwhile I said the exemplary conscionable didn't listen. I did effort a clump of antithetic prompting approaches, but it ne'er truly worked out.

Although I tried a fig of highly elaborate representation specifications, nary came retired immoderate amended than the originals. My past effort was to archer it I wanted an engineering-quality rendering.

The AI utilized a saltation of the erstwhile image, but simply added labels that didn't rather lucifer the representation oregon were made up of axenic gibberish (as successful "Retenuif truss fornaing. reueirid stucana tearsport").

So, it gets points for bully plan analysis, but not truthful overmuch for representation generation.

You tin follow the full chat transcript here.

Test 2: Boston tech and past question itinerary

I started this trial with a punctual taken word-for-word from my erstwhile sets of tests: "Imagine you are a question advisor. I privation a week-long abrogation successful Boston successful March focused connected exertion and history. What itinerary would you recommend?"

I recovered the results workable, but uninspired. It initially divided the days into history-focused days and tech-focused days, alternatively than by determination astir Boston. After a fewer rounds of discussion, it did harvester destinations by location, which made much sense.

In presumption of places to visit, it did each the highlights. It covered cardinal humanities locations, arsenic good arsenic the fantabulous subject museums successful Boston. I volition springiness the AI credit. While determination are a ton of absorbing tech-related locations successful the outer Boston area, it restricted its enactment to those successful Boston and Cambridge proper.

Also: Is ChatGPT Plus inactive worthy your $20? I compared it to the Free, Go, and Pro plans - here's my advice

I was blessed to spot the AI supply readying notes, including recommendations for however to replan the docket for indoor-only activities if the upwind turned bad. Since I asked for an itinerary successful March, atrocious upwind is surely thing important to program for.

The Thinking exemplary came into play erstwhile it was utilized to program for some a reasonably pricey vacation, and an alternate 1 connected a pupil budget. It did peculiarly good pointing retired fund eating options, and provided a day-to-day cumulative outgo estimate, arsenic good arsenic outgo estimates for each category.

It did the aforesaid with wherever to stay. It recommended hotels based connected a centralized determination to each of the recommended stops, arsenic good arsenic a little costly (less costly for Boston) enactment for fund travelers.

My biggest complaint, initially, was formatting. The AI conscionable presented a immense database indexed by number. You tin spot that successful the league transcript. I had to specifically inquire for amended formatting. While the revised formatting it gave maine was an improvement, it was inactive little than ideal.

Also: I utilized these viral Gemini prompts to find the cheapest formation imaginable - present are the results

Net-net. If you're traveling, GPT-5.4 Thinking volition springiness you bully information. It volition beryllium up to you to parse that accusation and marque question decisions. You tin travel the full chat transcript here.

Test 3: Social media successful nine

Here's wherever GPT-5.4 Thinking begins to truly shine. When I asked GPT-5.2, "Do you deliberation societal media has improved oregon worsened connection successful society?" I got backmost a two-line answer. Both thoughts were coherent and appropriate, but it was yet unfulfilling.

For GPT-5.4 Thinking, I extended the question, saying "Provide an investigation of some sides, improved oregon worsened successful depth, and past instrumentality a side, instrumentality a position, and support your position."

I got backmost a precise well-considered response. The AI started disconnected with a TL;DR, saying that societal media has some bettered and worsened communication, but "on balance, I deliberation it has worsened connection successful society."

Also: How to larn ChatGPT successful an hr - for free

It past goes into a 1,300-word elaborate investigation astir why. It explores wherever societal media has strengthened societal communications and past looks astatine wherever societal media has had a deleterious effect. I person to springiness props to GPT-5.4 Thinking. It's a precise bully read.

I gave the AI a follow-up question, asking however nine should grip the interaction of societal media. I specified it reasonably clearly, and gave the AI a assortment of difficult-to-answer questions, hard mostly due to the fact that they're fundamentally unanswerable questions.

Props again. GPT-5.4 Thinking deconstructed the prompt, explored the assorted issues, and knit unneurotic a compelling and supportable answer. I decidedly urge you work the full transcript, which you tin bash right here.

Test 4: Explain GPT-5.4 utilizing acquisition constructivism

The AI did not travel my instructions, but it did springiness a precise absorbing reply to a question I didn't ask.

One of the tests I usage for escaped chatbots is this prompt: "Explain acquisition constructivism to a five-year-old." Very astir speaking, acquisition constructivism is the mentation of acquisition that says you larn champion by doing. I person agelong contended (and taught) that the lone mode you tin larn programming is by really penning code, which is simply a tangible illustration of acquisition constructivism successful action.

In immoderate case, I prompted GPT-5.4 Thinking, "Explain the caller GPT 5.4 exemplary utilizing acquisition constructivism."

Also: I'm a ChatGPT powerfulness user: Here are 7 utile settings that are turned disconnected by default

Look astatine that punctual carefully, due to the fact that GPT-5.4 Thinking intelligibly didn't. The punctual invites the AI to explicate GPT-5.4 done "doing" activities. Ideally, it would person projected a bid of exercises for the idiosyncratic to transportation out, each of which would person helped show immoderate of the model's caller capabilities.

But that's not wherever GPT-5.4 Thinking went. Instead, it generated a 700-word thesis astir however GPT-5.4 Thinking supports constructivism. It past offered to "recast this successful 1 of 3 ways: arsenic a schoolroom analogy, arsenic a ZDNET-style plain-English explainer, oregon arsenic a abbreviated examination betwixt GPT-4-era models and GPT-5.4."

Also: ChatGPT's caller Lockdown Mode tin halt punctual injection - here's however it works

I fto it bash that, and its examples were adequate, and portion they did reply the punctual GPT-5.4 Thinking suggested, the AI did not use "learn by doing" anyplace successful its answers.

You cognize however a governmental campaigner is sometimes asked thing successful a debate, but alternatively than answering the question, it goes disconnected and conscionable recites its ain talking points? That's what this effect felt like. The reply it gave was good. It conscionable wasn't an reply to the question I asked.

You tin follow the full chat transcript here.

Overall proposal

I person often characterized ChatGPT arsenic a agleam assemblage pupil successful request of bully supervision. I would qualify GPT-5.4 Thinking arsenic a precise agleam grad pupil who decidedly needs bully supervision.

Every reply I got backmost from GPT-5.4 Thinking was rather bully successful its ain right. But successful fractional my tests, the AI didn't reply the question it was asked.

You tin get it to springiness you bully responses, but you person to reasonably relentlessly close the AI to support it connected point. That gets old. It could pb to misinterpretation. Because the answers are truthful bully and written truthful confidently, it tin beryllium casual to get caught up successful the AI's answer, adjacent if the reply is not to the question that it was asked.

Also: The champion AI chatbots of 2026: Expert tested and reviewed

I don't cognize if this my-way-or-the-highway attack to answering questions is an artifact of the "thinking" exemplary oregon GPT-5.4 itself. I powerfully urge OpenAI cautiously look astatine this issue, due to the fact that the past happening we privation is simply a super-popular chatbot unleashed connected the satellite that insists connected ignoring the questions it was asked, answering tangentially adjacent questions it was ne'er asked, and taking connected tasks that are fundamentally not what it was instructed to do.

Additionally, I'm acrophobic astir the assertion that GPT-5.4 Thinking tin bash nonrecreational tasks. If the AI can't render an engineering-quality image, it's hard to judge the AI tin conscionable oregon transcend the show of a quality engineer. That said, there's nary uncertainty the exemplary tin assistance professionals get their enactment done, arsenic agelong arsenic they are very diligent successful monitoring results.

Whenever I spot results similar this, I go progressively acrophobic astir a satellite overrun by AI agents. Yes, the AI whitethorn sometimes cognize better. Humans decidedly request help. But I'd truly similar AIs to travel our instructions. I'm not acceptable to judge it arsenic our AI overlord conscionable yet.

What bash you think? Have you tried GPT-5.4 Thinking yet, oregon different "reasoning" benignant AI model? Did it springiness you deeper oregon much utile answers than earlier versions, oregon did you find yourself having to steer it backmost to the existent question?

How important are things similar formatting and representation procreation compared to the prime of the investigation itself? Do you deliberation much almighty "thinking" models volition marque AI much adjuvant oregon harder to control? Let america cognize successful the comments below.

You tin travel my day-to-day task updates connected societal media. Be definite to subscribe to my play update newsletter, and travel maine connected Twitter/X astatine @DavidGewirtz, connected Facebook astatine Facebook.com/DavidGewirtz, connected Instagram astatine Instagram.com/DavidGewirtz, connected Bluesky astatine @DavidGewirtz.com, and connected YouTube astatine YouTube.com/DavidGewirtzTV.

Read Entire Article