← All insights
Explainers·5 min read

Claude Opus 4.8: what Anthropic's newest model means for Australian businesses

Anthropic released Claude Opus 4.8 on 28 May 2026. It is the company's most capable model, and the announcement came with the usual wall of benchmarks. If you run a business rather than a research lab, the question is simpler: did anything change that should change what you do on Monday?

Short answer: a little. Let me walk through what actually moved, in plain terms, and then tell you what I think most Australian SMBs should do about it.

The honesty gain is the headline

The most useful change has nothing to do with raw intelligence. Anthropic says Opus 4.8 is around four times less likely than its predecessor, Opus 4.7, to let flaws in code it has written pass unremarked.

That sounds narrow. It is not. The failure mode of AI tools in a business is not “it cannot do the task” - it is “it did the task wrong and told you it was fine.” A model that more reliably flags its own mistakes is a model you can trust with less supervision. That is worth more to a 20-person firm than another point on a leaderboard.

It matters most for anyone using Claude to write or check code, but the same instinct - surfacing doubt rather than papering over it - shows up across drafting, analysis and research work too.

A 1M-token context window, in real terms

Opus 4.8 can read up to one million tokens in a single go. That is the default on the Claude API, AWS Bedrock and Google Vertex AI. On Microsoft Foundry it is 200k. Maximum output is 128k tokens.

Tokens are an abstraction, so here is the human version. One million tokens is roughly a 700-page set of contracts, or a whole quarter of email, or your entire policy library - handed to the model at once, with nothing dropped.

The practical effect: you stop chopping documents into pieces and stitching answers back together. You point it at the whole thing and ask. For SMBs doing due diligence, reviewing supplier agreements, or making sense of a year of customer tickets, that is the feature that quietly removes the most friction.

Computer use is getting real

Opus 4.8 scored 84% on Online-Mind2Web, a benchmark for using a browser the way a person does - clicking, filling forms, navigating multi-step flows. Anthropic frames it as a meaningful jump over both Opus 4.7 and GPT-5.5.

One caveat worth stating plainly: cross-vendor benchmarks are self-reported and run on different harnesses, so treat any “beats competitor X” claim as directional, not gospel. The honest read is that browser-driving agents are crossing from demo to genuinely useful, and the trajectory is steep.

For now, treat agentic computer use as promising rather than production-ready for unsupervised work. It is excellent for accelerating a human who stays in the loop.

The cost reality

Here is where operators need to pay attention. Opus 4.8 is metered per token: US$5 per million input tokens and US$25 per million output (≈ AUD $8 / AUD $38). That is unchanged from Opus 4.7. Batch processing halves it, and prompt caching can cut input cost by up to 90%.

But look at the lineup before you default to the most capable thing:

  • Opus 4.8 - 1M context, top capability. US$5 / US$25 (≈ AUD $8 / AUD $38).
  • Sonnet 4.6 - 1M context, the balanced workhorse. US$3 / US$15 (≈ AUD $4.60 / AUD $23).
  • Haiku 4.5 - 200k context, the cheapest. US$1 / US$5 (≈ AUD $1.50 / AUD $7.70).

For the overwhelming majority of day-to-day business work - drafting, summarising, answering questions, first-pass analysis - Sonnet 4.6 is the right tool. It is fast, it has the same big context window, and it costs a fraction of Opus on output. Reaching for Opus 4.8 on everything is like couriering every letter.

When to pick Opus vs Sonnet vs Haiku

A simple rule of thumb:

  • Reach for Opus 4.8 when the task is genuinely hard and a wrong answer is expensive: dense legal or financial analysis, large-codebase migrations, multi-step research, anything where you would otherwise pay a specialist.
  • Default to Sonnet 4.6 for everything else. It is the daily driver for your whole team.
  • Use Haiku 4.5 for high-volume, low-complexity work: tagging, routing, simple extraction at scale, where speed and price matter more than depth.

Most teams over-index on the flagship. Costs creep, and the output is no better for the task at hand.

Chasing every release is the wrong game

The frontier labs now ship constantly. Opus 4.7 was 16 April. Sonnet 4.6 was February. Opus 4.8 is late May. If you treat each one as a fire drill, you will spend the year re-platforming and never actually embed any of it.

The win is not having the newest model. The win is getting your team genuinely using one good model every day, inside real workflows, with the habits to match. A business that has properly adopted Sonnet 4.6 is miles ahead of one that switched to Opus 4.8 last week and still uses it like a search box.

I run an 80-staff education business on this stack. The leverage never came from the upgrade. It came from the boring work of building it into how people actually do their jobs.

Where this fits

Opus 4.8 is a real step up, and the 1M context window and code-honesty gain are the parts worth caring about. But a better model is only leverage if your team is using it well.

XLev’s Claude implementation work is exactly that: picking the right model for each job, wiring Claude into the tools your team already lives in, and building the workflows and habits that turn a subscription into an actual productivity gain. If you want the upgrade to mean something, that is where to start.

Frequently asked questions

Is Claude Opus 4.8 worth it for a small business?
For most SMBs, not as the everyday tool. Opus 4.8 is Anthropic's most capable model and it earns its keep on hard, high-stakes work like contract analysis, complex code and multi-step research. But it is metered per token and costs five times more on output than Sonnet 4.6. The smart pattern is Sonnet 4.6 for daily work and Opus 4.8 reserved for the genuinely difficult jobs.
How much does Claude Opus 4.8 cost?
Opus 4.8 is priced at US$5 per million input tokens and US$25 per million output tokens, unchanged from Opus 4.7. Batch processing halves that, and prompt caching can cut input cost by up to 90%. Fast mode is US$10 / US$50. If your team uses Claude through a Pro, Team or Enterprise plan rather than the API, you pay the seat price instead and Opus access is gated by your plan and usage.
Opus 4.8 vs Sonnet 4.6 - which should my team use?
Sonnet 4.6 should be the default for almost everyone. It is the balanced workhorse, has the same 1M-token context, and costs US$3 / US$15 per million tokens against Opus 4.8's US$5 / US$25. Use Opus 4.8 when a task is genuinely hard: dense legal or financial analysis, large-codebase work, or research where a wrong answer is expensive. Haiku 4.5 is the cheapest option for high-volume, simple tasks.
Do I need to upgrade every time Anthropic ships a model?
No. Chasing every release is the wrong game. The frontier labs ship often, and the gap between a six-week-old model and the newest one is usually small for everyday business work. The real win is getting your whole team using one good model daily, with clear workflows. A better model does nothing for a business that has not adopted the last one.

Where this fits

Claude Implementation

Install Claude properly across your team - Claude Code, Claude.ai projects and skills, custom Anthropic SDK builds.