Quick Answer: GPT-5.5 (“Spud”), released April 23, 2026, is OpenAI’s first fully retrained base model since GPT-4.5. It scores 60 on the Artificial Analysis Intelligence Index (vs. 57 for Claude Opus 4.7), leads on Terminal-Bench and agentic evaluations, but trails Opus 4.7 on SWE-bench coding benchmarks. API pricing doubles to $5/$30 per million tokens, though the model uses ~40% fewer tokens per task. API access was not available at launch.
Last updated: April 2026
What GPT-5.5 actually is
GPT-5.5 is not another incremental update. OpenAI’s GPT-5.0 through 5.4 were refinements and fine-tunes of the same base model. GPT-5.5 is the first complete retrain since GPT-4.5 — a new foundation, not a patch.
It shipped just six weeks after GPT-5.4, available immediately in ChatGPT (Plus, Pro, Business, Enterprise) and Codex. API access was deliberately held back. OpenAI stated that “API deployments require different safeguards,” a move critics read as a distribution strategy that prioritizes OpenAI’s own products.
The codename is “Spud.” Greg Brockman called it “the smartest and most intuitive model in the company’s history” and “a new class of intelligence.”
Three variants, different depths
GPT-5.5 ships in three forms:
| Variant | Access | Use case |
|---|---|---|
| GPT-5.5 | Plus ($20/mo), Business, Enterprise | Fast responses, everyday tasks |
| GPT-5.5 Thinking | Plus, Pro, Business, Enterprise | Extended reasoning, deeper chain-of-thought |
| GPT-5.5 Pro | Pro ($200/mo) only | Deepest reasoning for high-stakes tasks |
API model identifiers: gpt-5.5 and gpt-5.5-pro. Neither was available via API on launch day.
Free-tier users do not get access to any GPT-5.5 variant.
Benchmark reality check
GPT-5.5 leads on aggregate intelligence indices but the picture is mixed when you look at specific evaluations:
| Benchmark | GPT-5.5 | Claude Opus 4.7 | Winner |
|---|---|---|---|
| Artificial Analysis Index | 60 | 57 | GPT-5.5 |
| Terminal-Bench 2.0 | 82.7% | 69.4% | GPT-5.5 |
| Expert-SWE (median human: 20 hrs) | 73.1% | — | GPT-5.5 |
| OSWorld-Verified | 78.7% | 78.0% | GPT-5.5 (narrow) |
| SWE-bench Pro | 58.6% | 64.3% | Opus 4.7 |
| SWE-bench Verified | — | 87.6% | Opus 4.7 |
| HLE (no tools) | 41.4% | 46.9% | Opus 4.7 |
| HLE (with tools) | 52.2% | 54.7% | Opus 4.7 |
| MCP-Atlas (tool use) | 75.3% | 79.1% | Opus 4.7 |
The summary from llm-stats.com: “Opus 4.7 leads on 6 of 10 shared benchmarks, GPT-5.5 on 4, with margins between 2 and 13 points.”
One important caveat: OpenAI claimed 82.7% on Terminal-Bench 2.0, but the benchmark owner’s own leaderboard showed 82.0% ± 2.2 on the same day. Small discrepancy, but worth noting given the competitive context.
What GPT-5.5 does differently
The core thesis: legibility
Where previous models required carefully structured prompts and multi-step supervision, OpenAI says 5.5 can take a “messy, multi-part task” and independently plan, execute, and iterate. Greg Brockman: “What is really special about this model is how much more it can do with less guidance.”
Recurrent self-refinement
GPT-5.5 integrates a novel “recurrent self-refinement loop” — the model internally critiques and revises outputs across multiple reasoning passes before generating a final response. This is architecturally different from chain-of-thought prompting; it happens inside the model’s inference process.
40% fewer tokens, same results
This is the most practically significant change. GPT-5.5 uses roughly 40% fewer output tokens than GPT-5.4 to complete equivalent tasks. For agentic workflows where you’re paying per token, this partially absorbs the doubled per-token price.
Artificial Analysis estimates the net effective cost increase is about 20% compared to GPT-5.4, not the 100% the rate card suggests.
Agentic capabilities
GPT-5.5 excels at:
- Writing and debugging code autonomously
- Researching online across multiple sources
- Analyzing data and creating documents
- Operating software through computer use
- Moving across tools until a task is finished
The 82.7% Terminal-Bench score tests exactly this: complex command-line workflows requiring planning, iteration, and tool coordination.
Pricing: doubled per-token, offset by efficiency
| GPT-5.5 | GPT-5.4 | Claude Opus 4.7 | |
|---|---|---|---|
| Input (per 1M tokens) | $5 | $2.50 | $5 |
| Output (per 1M tokens) | $30 | $15 | $25 |
| Context window | 1M | 1M | 1M |
The output price is the headline: $30 per million tokens, 2x GPT-5.4 and 20% more than Opus 4.7.
But because GPT-5.5 uses ~40% fewer output tokens per task, the effective cost increase over GPT-5.4 is closer to 20%. Against Opus 4.7, the comparison depends on workload — Opus 4.7’s new tokenizer can inflate token counts by up to 35% on code-heavy prompts, which narrows the gap.
For ChatGPT subscribers, there’s no additional cost. Plus ($20/mo), Pro ($200/mo), Business, and Enterprise tiers all include GPT-5.5 at no extra charge.
Enterprise positioning
OpenAI is clearly targeting Anthropic’s enterprise lead:
- Served on NVIDIA GB200 NVL72 infrastructure with 35x lower cost per million tokens and 50x higher token throughput per watt compared to prior-generation systems
- Available to Business and Enterprise ChatGPT tiers on day one
- System card published alongside launch for enterprise safety review
- 40% reduction in inference costs at the infrastructure level
The API holdback is part of this strategy: by funneling enterprise users through ChatGPT and Codex first, OpenAI controls the experience and collects usage data before opening the firehose.
Where GPT-5.5 wins and loses
Wins:
- Aggregate intelligence benchmarks (highest overall score)
- Terminal/agentic workflows (13-point lead over Opus 4.7)
- Computer use and autonomous task completion
- Token efficiency (40% fewer tokens per task)
- Enterprise infrastructure (NVIDIA GB200 partnership)
Loses:
- Traditional coding benchmarks (SWE-bench Pro: 58.6% vs Opus 4.7’s 64.3%)
- Knowledge-heavy reasoning (HLE: 5+ point deficit)
- Tool use precision (MCP-Atlas: 4-point deficit)
- API availability at launch
- Per-token pricing (most expensive frontier model)
The competitive picture
Three frontier models shipped in the same week of April 2026:
- Claude Opus 4.7 (April 16): Best at coding, tool use, and knowledge reasoning
- GPT-5.5 (April 23): Best at agentic autonomy, terminal workflows, and aggregate intelligence
- DeepSeek V4 (April 24): Competitive performance at 20-50x lower cost
The market is splitting along workflow lines rather than converging on a single winner. If your primary workload is coding, Opus 4.7 leads. If it’s autonomous multi-step task completion, GPT-5.5 leads. If cost matters more than marginal performance, DeepSeek V4 changes the equation entirely.
Who should switch
Switch now if:
- You’re a ChatGPT Plus/Pro subscriber (it’s already there, no extra cost)
- Your workload is agentic: multi-step tasks, computer use, autonomous research
- You were hitting GPT-5.4’s limits on complex terminal workflows
Wait if:
- You need API access (not available at launch)
- Your primary workload is coding (Opus 4.7 is stronger on SWE-bench)
- You’re cost-sensitive on API pricing (measure effective cost first)
Sources: OpenAI Community, Axios, Fortune, The Next Web, Artificial Analysis, Wikipedia, NVIDIA Blog