Claude Sonnet 5 Review: Benchmarks, Pricing & How It Compares to Opus 4.8

Q: What is Claude Sonnet 5's model ID and context window?

The Claude API model ID is claude-sonnet-5 (anthropic.claude-sonnet-5 on AWS Bedrock). It ships with a 1M-token context window and 128K max output tokens, raisable to 300K via a batch-API beta header.

Q: Where can I use Claude Sonnet 5?

Sonnet 5 is the default model for Free and Pro users on claude.ai and is available to Max, Team, and Enterprise plans. It's also live in Claude Code, the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.

On June 30, 2026, Anthropic released Claude Sonnet 5, the newest model in its mid-tier Sonnet line. Anthropic calls it the "most agentic Sonnet model yet" — a model that plans multi-step work, drives browsers and terminals, and follows a task through to completion instead of stopping halfway. That last part is the real story. Benchmarks aside, the most common thing early users mentioned is that Sonnet 5 finishes jobs that older Sonnet models would abandon or report as "done" when they weren't.

This review sticks to numbers Anthropic actually published in its official launch post and system card. Where a figure comes from a third party instead, it's labeled as such.

What Is Claude Sonnet 5?

Sonnet 5 sits in the middle of Anthropic's current lineup — above Haiku 4.5, below the flagship Claude Opus 4.8, and well below the export-restricted top tier covered in our Claude Fable 5 and Mythos 5 breakdown. Anthropic's pitch is that Sonnet 5 delivers agentic performance that used to require a larger, pricier model, at Sonnet-class pricing.

Model ID: claude-sonnet-5 on the Claude API, anthropic.claude-sonnet-5 on AWS Bedrock
Context window: 1 million tokens
Max output: 128K tokens (raisable to 300K via a batch-API beta header)
Thinking: Adaptive reasoning, on by default, with effort set to High out of the box on the API and in Claude Code
Knowledge cutoff: January 2026
Availability: Default model for Free and Pro plans on claude.ai; also available on Max, Team, Enterprise, Claude Code, the Claude API, AWS Bedrock, Google Cloud Vertex AI, and Microsoft Foundry

Benchmark Breakdown: Sonnet 5 vs Sonnet 4.6 vs Opus 4.8

The chart below plots every headline benchmark Anthropic published at launch. Opus 4.8 is included as the reference ceiling, and Sonnet 4.6 as the model most teams are upgrading from.

Claude Sonnet 5 vs Sonnet 4.6 vs Opus 4.8

Percentage-based evaluations published by Anthropic at launch (June 30, 2026). Opus 4.8 is shown as the reference ceiling.

GDPval-AA v2 — professional knowledge work (Elo-style score)

This is the one headline metric where Sonnet 5 (1618) edges past Opus 4.8 (1615). It is scored on an Elo-style scale, not a percentage, so it's charted on its own axis.

Sources: Anthropic, "Introducing Claude Sonnet 5" (Jun 30, 2026) and the Claude Sonnet 5 System Card. SWE-bench Pro is the harder "Pro" variant, not SWE-bench Verified — don't confuse the two when comparing against other published numbers.

Benchmark	Sonnet 5	Sonnet 4.6	Opus 4.8 (ref)
SWE-bench Pro (agentic coding)	63.2%	58.1%	69.2%
Terminal-Bench 2.1 (terminal & tool use)	80.4%	67.0%	82.7%
OSWorld-Verified (computer use)	81.2%	78.5%	83.4%
Humanity's Last Exam — no tools	43.2%	34.6%	49.8%
Humanity's Last Exam — with tools	57.4%	46.8%	57.9%
GDPval-AA v2 (professional knowledge work, Elo)	1618	1395	1615

Note: the coding row is SWE-bench Pro, the harder variant. Don't confuse it with SWE-bench Verified, where scores for most models run noticeably higher.

Two patterns stand out. First, Sonnet 5 improves on Sonnet 4.6 in every single category — this isn't a mixed upgrade. Second, Opus 4.8 still leads on raw accuracy everywhere except one metric: GDPval-AA v2 knowledge work, where Sonnet 5's 1618 edges past Opus 4.8's 1615. For the kind of drafting, research, and analysis work that agencies and founders lean on daily, Sonnet 5 is essentially matching the flagship model, for a fraction of the price.

Terminal-Bench 2.1 is the biggest single jump — a 13.4 point gain over Sonnet 4.6. That's the benchmark most relevant to agents that live inside a shell: running commands, reading output, recovering from errors. If your workflow involves Claude Code or any CLI-driven agent, this is the number that will actually show up in day-to-day reliability.

Sonnet 5 vs Opus 4.8: Which One Should You Use?

This is the real decision most teams face, since the two models now overlap heavily and Opus costs close to double at standard rates. Our full Claude Opus 4.8 review and the Opus 4.8 vs GPT-5.5 coding benchmark comparison go deeper on Opus specifically, but here's the short version for Sonnet 5:

Pick Opus 4.8 for the hardest, accuracy-critical jobs — frontier-difficulty coding tasks, computer-use automation where a few extra points of accuracy matter, or any cybersecurity work that needs reduced guardrails. Anthropic specifically recommends Opus for that last case, since Sonnet 5 was not deliberately trained on cybersecurity tasks and ships with cyber safeguards on by default.
Pick Sonnet 5 for the bulk of agentic coding, tool use, and knowledge work — day-to-day automation, content production pipelines, client reporting, and coding agents that don't need to squeeze out the last few points of accuracy. It's also the faster model of the two.

What Changed Since Sonnet 4.6

If you're already running Sonnet 4.6 in production, here's the version-over-version delta on every headline metric:

Terminal-Bench 2.1: +13.4 points (67.0% → 80.4%)
Humanity's Last Exam, with tools: +10.6 points (46.8% → 57.4%)
Humanity's Last Exam, no tools: +8.6 points (34.6% → 43.2%)
SWE-bench Pro: +5.1 points (58.1% → 63.2%)
OSWorld-Verified: +2.7 points (78.5% → 81.2%)
GDPval-AA v2: +223 points (1395 → 1618)

Anthropic's own safety assessment also reports that Sonnet 5 shows a lower overall rate of undesirable behaviors than Sonnet 4.6 — including better resistance to prompt-injection hijack attempts and lower rates of hallucination and sycophancy. It's not the safety leader across the whole lineup: Anthropic notes Sonnet 5 still shows somewhat higher rates of misaligned behavior on its internal audit than Opus 4.8 and the currently export-restricted Claude Mythos Preview. If you're weighing the full range of frontier options rather than just Anthropic's lineup, our top 5 frontier AI models of 2026 roundup puts Sonnet 5 in that broader context.

Pricing: What Sonnet 5 Actually Costs

There's a catch worth budgeting for. Sonnet 5 uses an updated tokenizer that maps the same text to roughly 1.0–1.35× more tokens than Sonnet 4.6 did. Anthropic set the introductory pricing to be roughly cost-neutral during the transition, which means the real question isn't the rate card — it's what happens on September 1, when standard pricing kicks in at the same $3/$15 list price as Sonnet 4.6, but against a token count that may be meaningfully higher for the same prompts and outputs. Agencies billing clients on a per-project basis should run a sample of real workloads through both models before assuming a flat swap.

Who Should Actually Use Claude Sonnet 5?

Founders and solo builders: Sonnet 5 is a sensible default. Near-Opus quality on coding and knowledge work at a third of the price is the kind of margin that matters when you're watching a token bill closely. Save Opus 4.8 for the handful of tasks that genuinely need the extra accuracy.

Agencies: the GDPval-AA v2 result — Sonnet 5 slightly ahead of Opus 4.8 on professional knowledge work — is the number to pay attention to if your work is heavier on client reporting, content production, and research synthesis than on frontier-difficulty coding. Route routine deliverables to Sonnet 5 and reserve Opus for the jobs where a six-point accuracy gap on coding benchmarks would actually change the outcome.

Developers building coding agents: the Terminal-Bench 2.1 jump (67.0% → 80.4%) is the most tangible upgrade if your agents operate inside a shell. It's also worth comparing how Sonnet 5 behaves inside an IDE agent versus a dedicated coding tool — our Cursor Composer 2.5 review covers that angle from the tooling side rather than the raw model side.

Editorial Verdict

Frequently Asked Questions

What is Claude Sonnet 5's model ID and context window?

The Claude API model ID is claude-sonnet-5 (anthropic.claude-sonnet-5 on AWS Bedrock). The context window is 1 million tokens, with 128K max output tokens, raisable to 300K via a batch-API beta header.

How much does Claude Sonnet 5 cost?

Introductory pricing is $2 per million input tokens and $10 per million output tokens through August 31, 2026, then $3/$15 from September 1 — the same list price as Sonnet 4.6, though the new tokenizer counts roughly 1.0–1.35× more tokens for the same text.

Is Claude Sonnet 5 better than Opus 4.8?

Not across the board. Sonnet 5 edges past Opus 4.8 only on GDPval-AA v2 knowledge work and nearly matches it on Humanity's Last Exam with tools. Opus 4.8 still leads on SWE-bench Pro, Terminal-Bench 2.1, OSWorld-Verified, and HLE without tools. Sonnet 5 wins on price and speed.

Where can I use Claude Sonnet 5?

It's the default model for Free and Pro users on claude.ai and available on Max, Team, and Enterprise plans, plus Claude Code, the Claude API, AWS Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.

Sources: Anthropic, "Introducing Claude Sonnet 5" (June 30, 2026) and the Claude Sonnet 5 System Card. All benchmark figures in this article are Anthropic's own published evaluation results.

4.6 / 5

Claude Sonnet 5 is the best value in Anthropic's current lineup, not the outright best model. Every headline benchmark improves over Sonnet 4.6, safety metrics move in the right direction, and the price stays roughly flat through August at a meaningfully higher quality bar. Opus 4.8 still wins on raw accuracy for the hardest jobs, and the tokenizer change means "same list price" won't always mean "same bill" once standard pricing returns in September. For the large majority of day-to-day agentic coding and knowledge work, Sonnet 5 is now the sensible default.