What's the typical build cost?

An SMB-grade inbound voice agent on a Twilio + Claude + LiveKit stack typically lands $25-80k AUD for v1, depending on the integrations needed (CRM, scheduling, billing, knowledge base). Ongoing run cost is dominated by per-minute streaming voice fees - usually $0.05-0.15/minute - plus the Claude API cost which is small in comparison.

Is the legal exposure worse for outbound vs inbound?

Materially. Inbound voice agents (the customer chose to call you) are low-risk in Australia provided you disclose the AI nature and handle data appropriately. Outbound AI calling triggers Do Not Call Register obligations, Spam Act requirements, and (depending on content) financial services rules - we recommend most SMBs avoid AI outbound voice without specific legal sign-off.

What's the alternative if a voice agent doesn't fit?

Three strong alternatives. AI-summarised voicemail (caller leaves a message; AI summarises it into the CRM with action items) - cheap and high-leverage. Inbound chatbot on the website (deflects calls before they happen) - usually higher ROI than a voice agent for low-volume businesses. AI-assisted human reception (the human is still on the line, but Claude is feeding them context, suggested responses and post-call summaries) - works well for businesses that handle calls of varied complexity.

How do voice agents handle escalation to humans?

The right pattern is for the agent to escalate aggressively - any sign of frustration, anything outside its scope, any high-stakes request, the call routes to a human with the AI's transcript and summary attached. Voice agents that try to handle everything they're asked produce the worst customer experience in your business. Voice agents that defer to humans early produce the best.

← All insights

Custom Builds·8 min read

Voice agents for SMBs: when they pay back (and when they're a vanity project)

AI voice agents are having a moment. Six months ago they sounded like robots and dropped half their calls. Today the good ones are indistinguishable from a competent human reception in clearly-scoped scenarios. That doesn't mean they're the right answer for every SMB phone line - and a six-figure voice agent project that should have been a chatbot is a painful mistake to walk back from.

The conditions for a voice agent to pay back

Two conditions need to be true simultaneously. Either alone is not enough.

Volume. The business takes more than ~100 inbound calls a day, and ideally 250+. Below 100, even a 50% deflection rate saves at most a couple of hours of receptionist time per day - which doesn’t pay back the build cost on any reasonable timeline. The economics of voice agents are dominated by the labour they replace, and that labour scales with call volume.

Homogeneity. Those calls cluster around a small number of repeatable patterns - bookings, FAQs, simple status lookups, after-hours triage, returns. A voice agent does well when it can handle most of one pattern with a tight workflow. It struggles when every call is different and judgement is required.

Hit both conditions and a well-built voice agent typically pays back inside 6-9 months. Miss either and you’re building expensive technology that won’t earn its keep.

Businesses that typically fit

Allied health and dental clinics with high inbound booking volume
Trades businesses with quote requests and after-hours emergencies
Logistics and delivery operations with status enquiries
Tutoring and education businesses with enrolment and timetable queries
Hospitality groups with reservations and basic FAQs

Common thread: high inbound volume, predictable call patterns, and a downstream system (CRM, scheduler, knowledge base) that the agent can reach into to actually transact, not just talk.

Businesses that typically don’t

Professional services firms (legal, accounting) where the calls are usually high-stakes and require judgement
B2B sales operations where the call is the relationship - automating it sends a bad signal
Low-volume businesses (under ~100 calls a day) regardless of pattern
Highly-variable inbound where every call is genuinely different

The stack

The default 2026 SMB voice agent stack:

Twilio (or equivalent) for telephony - the phone number, call routing, recording, transcription metadata.
LiveKit or Vapi for the real-time streaming voice layer - low-latency speech-to-text and text-to-speech with interruption handling.
Claude as the reasoning engine - understanding what the caller wants, deciding the next step, calling tools.
Tools (function calls) into your CRM, scheduler, knowledge base, billing system - whatever the agent needs to actually transact rather than just respond.

Build cost for an SMB-grade inbound voice agent on this stack is typically $25-80k AUD for v1, depending on the integrations required. Ongoing run cost is dominated by per-minute streaming voice fees - usually $0.05-0.15/minute - plus a small Claude API cost.

Legal considerations - especially outbound

Inbound AI voice agents are low-risk in Australia provided you:

Disclose at the start that the caller is speaking with an AI assistant
Handle personal information consistent with the Privacy Act and the Australian Privacy Principles
Provide an easy escalation path to a human
Don’t use the agent for regulated activities without specific compliance review (legal advice, financial advice, medical decisions)

Outbound AI voice agents are materially riskier. Do Not Call Register obligations apply, the Spam Act regulates content, and depending on what the agent is selling, financial services or health-related rules may also apply. We recommend most SMBs avoid AI outbound voice without specific legal sign-off and a solid case for why outbound is the right channel at all.

Alternatives that often fit better

Three alternatives we recommend before a full voice agent build:

AI-summarised voicemail - the caller leaves a voicemail, AI transcribes and summarises it into your CRM with extracted action items. Cheap, high-leverage, and covers most after-hours value without the build complexity.
Inbound chatbot on the website - deflects calls before they happen. For low-volume businesses this is usually higher ROI than a voice agent because it turns a phone problem into a chat problem.
AI-assisted human reception - a human is still on the line, but Claude is feeding them context, suggesting responses and producing post-call summaries. Works well for businesses with varied call complexity that don’t fit the homogeneity test.

The right escalation pattern

Voice agents that try to handle everything they’re asked produce the worst customer experience in your business. Voice agents that defer to humans early produce the best. Build the escalation pattern aggressively:

Any sign of caller frustration - escalate
Anything outside the agent’s defined scope - escalate
Any high-stakes request (cancellation of a major booking, complaint, refund above a threshold) - escalate
Caller explicitly asks for a human - escalate immediately, don’t try to resolve first

The agent should hand the call to a human with the AI’s transcript and summary attached, so the human starts the conversation knowing what’s already been discussed.

How XLev builds voice agents

Voice agents are a Custom Builds engagement at XLev. We start with a discovery call that includes a hard look at whether the volume and homogeneity tests are actually met - we’ll tell you honestly if a chatbot or AI-summarised voicemail would be a better first build. See Custom Builds for the service detail.

Frequently asked questions

When does a voice agent actually pay back for an SMB?: Two conditions need to be true. Volume: the business takes more than ~100 inbound calls a day, enough that even partial deflection produces meaningful labour savings. Homogeneity: those calls follow a small number of repeatable patterns - bookings, FAQs, simple status lookups, after-hours triage - rather than being all over the place. Hit both and a well-built voice agent typically pays back inside 6-9 months. Miss either and the build cost outweighs the labour saved.
What's the typical build cost?: An SMB-grade inbound voice agent on a Twilio + Claude + LiveKit stack typically lands $25-80k AUD for v1, depending on the integrations needed (CRM, scheduling, billing, knowledge base). Ongoing run cost is dominated by per-minute streaming voice fees - usually $0.05-0.15/minute - plus the Claude API cost which is small in comparison.
Is the legal exposure worse for outbound vs inbound?: Materially. Inbound voice agents (the customer chose to call you) are low-risk in Australia provided you disclose the AI nature and handle data appropriately. Outbound AI calling triggers Do Not Call Register obligations, Spam Act requirements, and (depending on content) financial services rules - we recommend most SMBs avoid AI outbound voice without specific legal sign-off.
What's the alternative if a voice agent doesn't fit?: Three strong alternatives. AI-summarised voicemail (caller leaves a message; AI summarises it into the CRM with action items) - cheap and high-leverage. Inbound chatbot on the website (deflects calls before they happen) - usually higher ROI than a voice agent for low-volume businesses. AI-assisted human reception (the human is still on the line, but Claude is feeding them context, suggested responses and post-call summaries) - works well for businesses that handle calls of varied complexity.
How do voice agents handle escalation to humans?: The right pattern is for the agent to escalate aggressively - any sign of frustration, anything outside its scope, any high-stakes request, the call routes to a human with the AI's transcript and summary attached. Voice agents that try to handle everything they're asked produce the worst customer experience in your business. Voice agents that defer to humans early produce the best.

Where this fits

Custom Builds

Bespoke web apps, internal tools and AI products built on Claude and the Anthropic SDK.

See Custom Builds →Book a free 30-min discovery call