Computer use and agentic browsers: can AI run your screen yet?
For two years the demo was the same. An AI watches a screen, moves a cursor, books a flight or fills a form, and the room claps. The question every SMB owner actually has is simpler: can it do that on my systems, reliably, without supervision, today? In mid-2026 the honest answer is "almost, for some things, with a person watching". That is a real shift, and worth understanding before you either buy the hype or dismiss it.
What “computer use” actually means
Computer use is an AI agent that operates software the way a person does. It reads what is on the screen, moves a pointer, clicks buttons, types into fields, and navigates between pages and apps. An agentic browser is the same idea wrapped in a web browser: you give it a goal and it drives the tabs.
This matters because of a gap most owners feel every day. Your accounting tool talks to your bank, your CRM talks to your email, but the supplier portal, the council website and the insurer’s quote form talk to nothing. They have a login and a screen and no usable API. Computer use is the first technology that can automate those systems without a developer building an integration the vendor never offered.
What is genuinely usable mid-2026 vs still demo-grade
The capability has crossed a threshold. The clearest signal is the benchmark called Online-Mind2Web, which runs agents through 300 real tasks on 136 live websites and scores each step, not just the final answer.
- Anthropic’s Claude Opus 4.8 scored 84% on Online-Mind2Web, which Anthropic calls the strongest computer-use and browser-agent result it has tested (anthropic.com, May 2026). That is the first score high enough to be useful for low-stakes automation like filling forms, navigating dashboards and pulling data out of web apps.
- OpenAI folded its Operator agent into ChatGPT agent mode, which also runs inside its Atlas browser. Atlas launched on macOS first, with Windows and mobile flagged as coming. Agent mode is available to Plus, Pro and Business users (OpenAI, 2026).
- Perplexity Comet went free in early 2026 across Mac, Windows, iOS and Android, with an enterprise tier that lets admins deploy it by device management and control which actions the agent can take (Perplexity, 2026).
- Google retired the standalone Project Mariner in May 2026 and moved the capability into the Gemini agent and Chrome’s “auto-browse” feature, which in 2026 was limited to paying AI Pro and Ultra subscribers in the US (Android Authority, 2026).
- Microsoft Copilot Studio shipped computer-using agents to general availability in May 2026, running OpenAI’s computer-using model and Claude Sonnet 4.5, with Azure Key Vault credential storage, Microsoft Purview audit logging and configurable human review (Microsoft Copilot Studio blog, 2026).
So the capability is real across all five. What separates them is not the demo, it is the controls. Microsoft is furthest ahead on the boring, essential parts an SMB needs: locked-up credentials, a log of every action, and a built-in approval step. That is the difference between a clever toy and something you can put near real work.
Still demo-grade: anything multi-step, high-stakes and unsupervised. An 84% task success rate is strong for this category. It also means roughly one task in six goes wrong. OpenAI’s earlier Operator sat around 61% on the same public leaderboard, so the trend line is steep. The absolute number is not yet “set and forget”.
The SMB use cases worth starting with
Pick work that is high-volume, low-stakes, and has a clear right answer.
- Repetitive browser admin. Copying data between two systems that do not integrate. Updating records across a portal and a spreadsheet.
- Form-filling. Lodging the same structured information into a government, insurer or supplier form again and again.
- Data gathering. Pulling figures from a supplier portal into a spreadsheet, or collecting structured data across a set of sites for a weekly report.
The pattern in all three: it is dull, the cost of a single error is small, and a human can check the output in seconds. That is the sweet spot.
The real risks
Two risks deserve naming, because they do not show up in the demo.
Reliability. One task in six failing is fine when a human reviews the result and the worst case is a re-run. It is not fine when the agent is the last step before money moves or a message goes to a customer. Keep a person on the output until you have watched a workflow run cleanly for weeks.
Security, specifically prompt injection. Because the agent reads web pages, documents and emails and acts on what it reads, a hidden instruction buried in any of those can hijack it. OpenAI has said publicly that prompt injection is unlikely to ever be fully solved for browser agents (CyberScoop, IT Pro, 2025). And the agent acts with your login and your permissions. So give it least-privilege access, isolated credentials in a vault rather than its own broad admin account, and a human approval step on anything sensitive.
The operator take
Computer use is promising and accelerating fast. The category went from coin-flip reliability to genuinely useful in about a year, and the enterprise controls are catching up. For an SMB it is real leverage on the dull browser admin that no integration ever covered.
But keep a human in the loop for now. Treat the agent as a capable junior who needs checking, not an unsupervised employee. Start it on low-stakes, repetitive tasks, log every run, give it the narrowest access that works, and never point it at payments, contracts or customer messaging until it has earned that trust by watching it. That is not caution for its own sake. It is how you capture the upside this year without wearing the downside.
Frequently asked questions
- What does computer use actually mean?
- Computer use is an AI agent that operates a screen the way a person does - it reads what is on the page, moves a cursor, clicks buttons, types into fields and navigates between apps and websites. Instead of calling a clean API, it drives the same software you do. That matters because most SMB tools have a login and a screen but no usable API, so computer use is the only way to automate them.
- Is computer use reliable enough to use in a real business yet?
- Partly. The best model, Claude Opus 4.8, completes about 84% of multi-step web tasks on the Online-Mind2Web benchmark. That is strong for the category but it still means roughly one task in six goes wrong. Treat it as a capable junior who needs checking, not an unsupervised employee. Reliability is good enough for low-stakes, repetitive admin and not yet good enough for anything that spends money or sends external messages without review.
- Which computer-use products are real in mid-2026?
- OpenAI ships agent mode inside ChatGPT and its Atlas browser. Anthropic offers Claude computer use through its API and apps. Perplexity Comet is a free agentic browser with an enterprise tier. Microsoft Copilot Studio computer-using agents are generally available with enterprise controls. Google retired the standalone Project Mariner and moved the capability into Gemini and Chrome. Capability is real across the board; production-grade controls are most mature in Copilot Studio.
- What are the security risks of an agentic browser?
- The main one is prompt injection: hidden instructions buried in a web page, document or email that the agent reads and obeys, redirecting it to leak data or take unwanted actions. OpenAI has said publicly that prompt injection may never be fully solved for browser agents. Because these agents act with your login and your permissions, you should give them least-privilege access, isolated credentials and a human approval step on anything sensitive.
- What should an SMB use computer use for first?
- Start with high-volume, low-stakes browser admin that has a clear right answer: copying data between two systems that do not talk to each other, filling repetitive forms, pulling figures from a supplier portal into a spreadsheet, or gathering structured data from a set of sites. Keep a person reviewing the output, log every run, and never point it at payments, contracts or customer messaging until you have watched it work for weeks.
Where this fits
Custom Automations
n8n-led automation engagements with Claude wired in for AI-powered reasoning steps.