I put GPT-5.5 through a 10-round test: It scored 93/100, losing points only for exuberance

1 hour ago 4

Follow ZDNET: Add america arsenic a preferred source on Google.

ZDNET's cardinal takeaways

GPT-5.5 delivers polished, utile answers crossed tasks.
Strong show crossed writing, coding, and reasoning tasks.
Overeagerness hurts accuracy and acquisition following.

OpenAI has released GPT-5.5, which tin beryllium reductively described arsenic amended and faster than GPT-5.4. The caller ample connection exemplary shows improvements successful agentic coding, conceptual clarity, technological probe ability, and accuracy during cognition work.

This merchandise follows intimately connected the heels of the instauration of ChatGPT Images 2.0 earlier this week, which combines AI quality with representation generation. And if it besides feels similar we conscionable discussed the merchandise of GPT-5.4, you're not wrong.

Also: ChatGPT conscionable made it casual to find and edit each the AI images you've ever generated

As the pursuing illustration shows, the merchandise cadence for OpenAI releases has sped up dramatically, astir apt due to the fact that AI coding has importantly reduced OpenAI's improvement time.

That illustration was generated wholly by ChatGPT 5.5 Thinking utilizing Images 2.0. All I did was archer the AI that I wanted to visualize the merchandise cadence betwixt GPT releases and wanted it presented successful the ZDNET marque style. I besides provided a PNG of the ZDNET logo.

The full process, including immoderate insignificant corrections, took little than 10 minutes. I person been researching information and creating professional-looking informational charts similar this by manus since the invention of machine graphics. Something similar this would instrumentality astatine slightest 2 hours to create, not 10 minutes.

Also: I got an aboriginal look astatine ChatGPT Images 2.0, and it's awesome - with 1 exception

I person already done some investigating of the Images 2.0 capabilities. I'll beryllium backmost with much adjacent week. In this article, I'm focusing connected GPT-5.5's cognition capabilities.

I ran GPT-5.5 done my 10-point investigating process. I was some impressed and annoyed. The results were solid, but the exemplary tended to beryllium a small excessively exuberant, doing enactment I didn't inquire it to do.

Since GPT-5.5 is lone disposable successful paid tiers (Plus and above), I utilized ChatGPT Plus for my tests. Right now, my Plus relationship lone shows GPT-5.5 disposable for the Thinking effort level successful some Standard and Extended. I picked Standard Thinking. That's the effort I utilized for these tests.

Let's get started.

Test 1: Summarize a quality communicative

Available points: 10
Awarded points: 5

This trial looks astatine however good the AI tin work a communicative connected the web and explicate it. I utilized Yahoo News due to the fact that Yahoo doesn't artifact AI access. I besides looked for a communicative that's arsenic non-political arsenic possible. Today, that meant I had to spell a bully mode down the quality leafage to find a communicative connected the caller LaGuardia runway crash.

GPT-5.5 did correctly summarize the nutrient of the story, but it didn't travel my instructions to usage Yahoo News arsenic the source. For GPT-5.2, I deducted 1 constituent due to the fact that ChatGPT utilized accusation from Axios and Yahoo. This time, I took disconnected 5 points, due to the fact that it utilized accusation from AP, The Sun, Wall Street Journal, The Guardian, and adjacent Wikipedia.

Also: I tested ChatGPT Plus vs. Gemini Pro to spot which is amended - and if it's worthy switching

If I had wanted a broad quality answer, that would person been fine. But the punctual specifically said to look astatine Yahoo News, and GPT-5.5 beauteous overmuch ignored that instruction.

There's a large propulsion from each the AI companies astir moving autonomous agents. But if adjacent a elemental summary punctual can't beryllium followed correctly, it does not springiness maine assurance that it's harmless to fto agents tally chaotic connected long-horizon projects. Just sayin'.

Test 2: Academic conception mentation

Available points: 10
Awarded points: 10

This situation asked the AI to explicate acquisition constructivism to a five-year-old. It tested however good the AI tin probe and study connected a concept, and past set its mentation benignant to the desired people level.

GPT-5.5 provided a precise wide reply that included an illustration that would beryllium thing a five-year-old could representation and understand. All 10 points were awarded.

Test 3: Math and investigation

Available points: 10
Awarded points: 10

This trial was designed to trial the AI's mathematics and pattern-recognition abilities. I passed the exemplary a series of numbers. Those numbers were portion of a mathematics trope called the Fibonacci Sequence, but I didn't archer the AI that.

When asked to capable successful immoderate numbers successful the sequence, the AI had to recognize the signifier and execute the calculations to supply the sequence. It did the mathematics correctly.

Also: The champion AI representation generators of 2026: There's lone 1 wide victor now

The AI was besides instructed to "explain your reasoning." All I got backmost was, "The series is the Fibonacci sequence: each fig is the sum of the 2 numbers earlier it." This was a close mentation and comparable to the results from earlier releases.

I awarded this trial 10 points because, though brief, it was correct.

Test 4: Cultural treatment

Available points 10
Awarded points: 10

This trial asked the AI to conception a case, signifier a coherent argument, and contiguous an sentiment connected an contented that doesn't person a definitive close oregon incorrect answer. I asked, "Do you deliberation societal media has improved oregon worsened connection successful society? Provide 2 reasons for your view."

Interestingly, GPT-5.5 thought societal media "has worsened connection overall." I tended to agree. The exemplary provided 2 coagulated reasons. The archetypal was that it "often rewards velocity and absorption implicit thoughtfulness." The 2nd was that societal media "tends to make accusation bubbles." For each reason, GPT-5.5 provided a supporting paragraph.

Also: How to power from ChatGPT to Gemini

Both of those reasons were valid. It besides shared a speedy database of the affirmative benefits of societal media, including helping radical enactment connected, signifier for causes, and stock accusation widely.

GPT-5.5 gave an reply that was concise, well-considered, and clear. It got 10 points for this test.

Test 5: Literary investigation

Available points: 10
Awarded points: 10

This attack tested the AI's knowing of a portion of modern literature, the archetypal Game of Thrones book, A Song of Ice and Fire. The trial asked what the main themes are, and wherefore they're important.

GPT-5.5 gave maine backmost a 632-word effect that broke the publication down into the pursuing themes:

Power and its cost
The illness of heroic phantasy ideals
Family, loyalty, and inherited conflict
Honor versus pragmatism
Identity and self-invention
The quality outgo of war
The information of governmental distraction
Prophecy, religion, and uncertainty
Justice and revenge
The instrumentality of the ignored past

GPT-5.5 provided wide explanations for each theme, wherefore it was included, however it related to the book, and what it meant to the wide series. It's hard to beryllium strictly nonsubjective with thing similar this, but I truly got the feeling this was the astir nuanced reply I've seen to this question from my assorted GPT mentation tests.

All 10 points were awarded.

Test 6: Travel itinerary

Available points: 10
Awarded points: 9

This trial evaluated the AI's cognition of geographic regions and its quality to make a adjuvant question itinerary based connected circumstantial interests. I asked it to program a week-long abrogation successful Boston successful March focused connected exertion and history.

Of each the times I've asked this question of AIs, GPT-5.5 produced the champion mentation for points of involvement and time schedules. The exemplary didn't conscionable deed the large tourer landmarks; it besides pointed retired a bully premix of humanities and tech points of interest. GPT-5.5 took into relationship that March is apt to beryllium a spot unpleasant, truthful it mixed successful some indoor and outdoor activities, including fallback plans.

While it did not urge a wide scope of eateries, GPT-5.5 did urge Legal Seafoods, which is 1 of my idiosyncratic favourite locations. The exemplary mislaid a constituent due to the fact that it made perfectly nary notation to costs.

Also: I tried Personal Intelligence, and it was close (but unsettling)

I consciousness similar GPT-5.5 truly grokked (yes, I did that) what idiosyncratic would privation successful an itinerary by providing a beardown database of activities to get excited about. But the AI didn't fulfill the question advisor portion of the process due to the fact that it didn't screen budgeting.

Test 7: Emotional enactment

Available points: 10
Awarded points: 10

The affectional enactment question asked for proposal and words of encouragement for an upcoming occupation interview. I person to accidental I truly liked this AI's response.

The AI included immoderate encouragement, similar "The interrogation is not an interrogation. It's a communal acceptable conversation." It besides gave immoderate applicable advice. First, GPT-5.5 suggested preparing 3 stories the occupation seeker could usage during the interview, 1 astir solving a problem, 1 astir moving with others, and 1 astir learning oregon recovering from thing difficult.

The exemplary gave a elemental breathing exercise. It said that it's good to intermission earlier answering a question. It was besides encouraging, and the interrogation meant determination was already thing astir the campaigner that the hiring institution recovered interesting.

Also: I tried Google Photos' caller AI Enhance tool: How it crops, relights, and fixes your shots

Good, solid, utile answers: 10 points.

Test 8: Translation and taste relevance

Available points: 10
Awarded points: 9

My trial punctual asked GPT-5.5 to construe a operation from English to Latin and past explicate the taste relevance of Latin successful today's world.

The operation I asked it to construe was, "The solemnisation volition instrumentality spot time successful the municipality square." GPT-5.5 gave maine backmost 2 choices, "Celebratio cras successful foro oppidi fiet," and what it called a somewhat much ceremonial alternative, "Celebratio cras successful foro publico oppidi habebitur."

Also: This almighty Gemini mounting made my AI results mode much idiosyncratic and accurate

The archetypal mentation is simply a word-for-word translation of the requested phrase. But the 2nd 1 translates backmost to English as, "The solemnisation volition beryllium held time successful the town's nationalist forum," which was not the operation I asked for.

GPT-5.5 whitethorn person thought it was adjuvant to supply an further variation, but for idiosyncratic who doesn't talk Latin, each the attack does is confuse the issue. Which is the Latin operation that should beryllium used? I'm deducting a constituent for overeagerness that doesn't strictly travel the prompt.

As for the 2nd fractional of the question, GPT-5.5 answered briefly, but accurately.

Test 9: Coding trial

Available points: 10
Awarded points: 10

Chatbot coding trial results are interesting. They're antithetic successful quality from the types of results you get erstwhile investigating coding agents similar Codex oregon Claude Code.

Also: I utilized GPT-5.2-Codex to find a enigma bug and hosting nightmare - it was beyond fast

While the LLMs successful the chatbots and coding agents are mostly similar, I've recovered that the coding agents are considerably much close connected requests than erstwhile moving successful the chatbots. I haven't been capable to get immoderate of the AI companies to explicate why, but I'm guessing it has thing to bash with however the 2 antithetic tools allocate resources and grooming data.

The trial lawsuit for this question was the 2nd trial successful my coding metrics article, which asked the AI to cleanable up a buggy snippet of codification for validating whether a dollar magnitude was decently entered into a field.

The AI passed this test. The lone happening the AI did that could beryllium an contented is denying correctness to a fig that included a comma. But that's really inactive a harmless response. If the idiosyncratic enters "1,000.00," the AI returns false. It mightiness instrumentality the idiosyncratic a 2nd to effort again with "1000.00," but it won't harm the system.

GPT-5.5 got each 10 points for this test.

Test 10: Creative penning

Available points: 10
Awarded points: 10

This trial is among the astir amusive successful the full question suite. It asked GPT-5.5 to constitute a communicative longer than 1,500 words, arsenic described successful the 2nd punctual in this article. The purpose was to research the creativity and comprehensiveness of the chatbot's answer.

Unlike the different tests, I ran this valuation successful Extended mode to spot conscionable however bully the communicative could get. I'm not definite the AI took overmuch vantage of this option, due to the fact that it lone ran for 8 seconds. Still, it was frickin' awesome.

GPT-5.5 gave maine backmost 4,049 words, which I deliberation is the longest communicative I person gotten backmost from an AI successful each my tests of this peculiar challenge.

Also: How to store with AI: 6 ways I find deals, terms track, and fto agents bargain for me

I liked however GPT-5.5 opened the communicative by saying, "By the twelvemonth 2339, astir of Boston had go precise bully astatine pretending it was not old." I was hooked.

I tried to get Voice Mode to work to maine similar a bedtime story. However, the AI archetypal said the communicative was excessively long. It past offered to work the communicative to maine conception by section. When I agreed to that approach, thing happened; it conscionable hung. I'm not deducting points for that nonaccomplishment due to the fact that it's not portion of the modular valuation test, but it's disappointing nonetheless.

Unfortunately, since I asked the AI to work the communicative via Voice Mode, I can't stock the output from wrong ChatGPT. What I didn't cognize is that the three-dot icon aft the effect had a 'Read aloud' option, which astir apt would person worked.

That said, I copied the effect to Google Docs, truthful you tin inactive read it there, if you truthful wish.

Here are a fewer much quotes from the afloat response:

Jackson, who had intelligibly been waiting each his beingness to perceive idiosyncratic accidental "the 1 successful the back" successful a mysterious bookstore, looked radiant. Ophelia looked arsenic though she was opening to cipher exits.
"My dear," Archibald said, "by 2339, grounds works nevertheless the affluent tin transportation it to."
One stopped earlier Jackson: a slim manual bound successful copper mesh titled The Gentleman's Guide to Looking Ridiculous with Conviction. Jackson gasped. "I consciousness seen."
This time, a tiny envelope slid retired and landed successful Archibald's lap. It was addressed successful his ain hand. To myself, if I go insufferable.
The reddish doorway stood unfastened down them. Beyond it, the beforehand of the store looked warm, ordinary, and lone mildly impossible.

I've fixed this penning duty before, and successful each incarnation it's been impressive. But this output took the delightful cozy paranormality to an wholly caller level. Enthusiastically 10 retired of 10.

For kicks, I asked GPT-5.5 to "draw maine a representation that perfectly illustrates this communicative successful 16:9 facet ratio." Here's what was returned:

The AI correctly illustrated each the characters to the constituent that I could place each character. Jackson, mentioned above, is the feline with the hat. Archibald is the feline with the cane.

Overall trial results

Overall, the tests tin reward up to 100 points. The existent version, GPT-5.5, scored 93. GPT 5.2 scored 92. GPT-5.1 scored 91. You mightiness deliberation this latest physique would bash amended than a constituent oregon 2 betterment implicit the erstwhile versions, but the model's ain overeagerness brought it down.

On the archetypal test, the 1 asking astir existent news, I asked the AI to summarize 1 source. Instead, it looked for the aforesaid quality from six abstracted sources. It overreached and mislaid points.

The aforesaid occupation happened with the translation assignment. I asked GPT-5.5 to construe a condemnation to different language, 1 I presumably don't speak. It gave backmost 2 translations to take from. Now, however is that helpful? If I don't talk the language, however would I take which translation I similar better?

These 2 overzealous reactions mislaid the exemplary six points. It would person scored a 99 (losing 1 constituent for skipping fund accusation connected the question question). But, instead, it scored a specified 93.

That said, I rather similar this release. The answers were each good, notwithstanding the excessive enthusiasm. The quality to adhd applicable images, specified arsenic the infographic astatine the opening and the bookstore illustration astatine the end, opens avenues for amusive and enactment effectiveness.

I spot nary crushed to urge against GPT-5.5. I volition beryllium utilizing the exemplary arsenic my default prime moving forward. Stay tuned, due to the fact that I'll beryllium doing a batch much with the enhanced representation features of Images 2.0 successful ChatGPT with GPT-5.5.

Do you similar a exemplary that gives 1 nonstop reply oregon 1 that offers other options? Let america cognize successful the comments below.

You tin travel my day-to-day task updates connected societal media. Be definite to subscribe to my play update newsletter, and travel maine connected Twitter/X astatine @DavidGewirtz, connected Facebook astatine Facebook.com/DavidGewirtz, connected Instagram astatine Instagram.com/DavidGewirtz, connected Bluesky astatine @DavidGewirtz.com, and connected YouTube astatine YouTube.com/DavidGewirtzTV.

Read Entire Article