An AI agent is a loop where a language model calls itself iteratively, decides which tools to invoke (database queries, API calls, emails), and carries state between iterations. The Anthropic and OpenAI engineering guides distinguish simple LLM workflows from autonomous agents.

Where do AI agents reliably work today?

Narrow, repeatable back-office tasks with clear success criteria: invoice triage, lead enrichment, support-ticket classification, document summarisation. Anything where the cost of being wrong is small and a human can verify the output.

Where do AI agents fail?

Tasks requiring genuine judgement, regulated outputs (medical, legal, financial advice), customer-facing decisions without human escalation, and anything irreversible (sending money, deleting records). Gartner predicts most agentic-AI projects will be cancelled by 2027 due to governance failures.

How do you scope a first agent project?

Pick one workflow currently done by a person 50+ times a week, with a clear input, a clear output, and a human escalation path for edge cases. Measure baseline (time, error rate, throughput) before deployment so the ROI claim survives a buyer's diligence.

← All insights

Explainers·9 min read·Updated Apr 2026

Agentic AI for operations: what an 'agent' actually is, and where it works

Strip the marketing away and an 'agent' is three things: a loop, a set of tools, and a memory. That's it. Here is where they reliably remove cost in back-office operations, where they break, and how to scope a first one that survives diligence.

What an agent actually is

Anthropic's engineering guidance distinguishes carefully between two patterns that often get sold as the same thing[1]. A workflow is a fixed sequence of LLM calls — predictable, easy to debug. An agent is an LLM in a loop, deciding for itself which tool to use next based on what it observed last. It can call APIs, read files, query databases, send emails — anything you give it a tool for — until it decides it is done.

OpenAI's own practical guide to building agents is equally blunt: agents are powerful where the path is genuinely unpredictable, and a liability where it isn't[2]. If you can draw the workflow on a whiteboard, do not build it as an agent. Build it as a workflow and save yourself six months of debugging.

Where agents reliably work in back-office ops

Inbound triage with messy inputs. A new email arrives. The agent reads it, looks up the customer in the CRM, checks the order system, decides whether this is a refund, a complaint, a sales lead, or a routine query, and routes accordingly with a draft response.
Reconciliation and investigation. A statement line doesn't match. The agent pulls the related invoices, supplier emails, and bank entries, and either resolves it or escalates with a summary.
Research and enrichment. A new lead drops into the CRM. The agent enriches the record from public sources, scores it against your ICP, and assigns it.

Where agents break

Gartner's adoption tracking is unusually specific: it expects more than 40% of agentic-AI projects to be cancelled by 2027, mostly because organisations underestimate the operational governance required to run them in production[3]. The failure modes are consistent:

Loops that don't terminate, burning tokens until someone notices the bill.
Tools the agent uses incorrectly because the tool's description was vague.
High-stakes actions taken without a human checkpoint.
No logging, so when something goes wrong nobody can reconstruct what happened.

How to scope a first agent that survives diligence

Pick one workflow with a clear success metric. "Reduce average time-to-first-response on inbound enquiries from 4 hours to under 30 minutes." Not "use AI more".
Constrain the tools. Give it the minimum set of tools needed. Every additional tool is a new failure surface.
Force a human-in-the-loop on irreversible actions. Sending money, sending external email above a threshold, modifying customer records — all gated.
Log everything. Every step, every tool call, every output. This is what makes the system auditable when a buyer asks.
Run an eval suite weekly. A fixed set of test cases the agent has to pass. When the underlying model changes, you find out before your customers do.

Why this matters for the multiple

A documented, logged, governed agent that handles a function which used to consume two FTEs is durable margin. A clever demo running on the founder's laptop is not. The difference is exactly what a quality-of-earnings provider will and will not credit. We unpack that distinction in AI evidence in due diligence.

Frequently asked questions

What is an AI agent?: An AI agent is a loop where a language model calls itself iteratively, decides which tools to invoke (database queries, API calls, emails), and carries state between iterations. The Anthropic and OpenAI engineering guides distinguish simple LLM workflows from autonomous agents.
Where do AI agents reliably work today?: Narrow, repeatable back-office tasks with clear success criteria: invoice triage, lead enrichment, support-ticket classification, document summarisation. Anything where the cost of being wrong is small and a human can verify the output.
Where do AI agents fail?: Tasks requiring genuine judgement, regulated outputs (medical, legal, financial advice), customer-facing decisions without human escalation, and anything irreversible (sending money, deleting records). Gartner predicts most agentic-AI projects will be cancelled by 2027 due to governance failures.
How do you scope a first agent project?: Pick one workflow currently done by a person 50+ times a week, with a clear input, a clear output, and a human escalation path for edge cases. Measure baseline (time, error rate, throughput) before deployment so the ROI claim survives a buyer's diligence.

Want this for your business?

Start with a Diagnose. Two weeks. Written report. Honest fit assessment.

Send an enquiry