The AI stack buyers want to see (and what each layer signals)
There is now a recognisable reference architecture for production AI. It has five layers. Each layer sends a specific signal to a sophisticated acquirer about how grown-up your AI operation actually is.
The reference architecture
a16z's widely-cited write-up on emerging architectures for LLM applications captures the consensus shape of a production AI system [1]. Five layers, bottom to top:
- Models — the underlying LLMs (frontier and open-weight) doing the actual inference.
- Gateway — a router that sits in front of the models, choosing per request and handling fallback[2].
- Orchestration — workflows and agents that compose model calls and tool calls into useful work.
- Observability and evals — logging, tracing, and automated quality testing of every workflow[3].
- Governance — policy, risk, human-in-the-loop rules, and the audit trail that ties it all together[4].
What each layer signals to a buyer
Models — and why "we use the best model" is the wrong answer
A buyer wants to see a documented model policy: which models are approved for which use cases, how that decision was made, and how it is reviewed. "We use GPT-5 for everything" signals vendor concentration risk. "We use cheap models for classification, frontier models for reasoning, and the choice is logged per request" signals an operating system.
Gateway — and why its absence is now a red flag
The gateway is what makes the model layer swappable. Without one, every workflow is hard-coded to a vendor; with one, vendor risk is bounded and cost is observable. Gartner has been explicit that single-vendor AI strategies are now a diligence flag.
Orchestration — and why it should mostly be workflows
A grown-up AI estate is mostly fixed workflows with a few true agents at the edges. The reverse — agents everywhere, workflows nowhere — signals an organisation that has confused capability with control. We cover the distinction in agentic AI for operations.
Observability — the layer that converts AI from claim to evidence
Tools like Langfuse, LangSmith and the major cloud observability vendors all converge on the same primitives: traces, prompts, outputs, costs, latency, eval scores. A buyer reading these dashboards can answer the only question they actually care about — does this work, reliably, at this cost — without having to take management's word for it.
Governance — the layer that bounds the downside
NIST's AI RMF four-function model — govern, map, measure, manage — is the standard buyers and their advisors are increasingly anchoring to. A two-page policy that maps to those four functions, with named owners and named controls, is enough at lower-middle-market scale.
How this moves the multiple
A buyer paying a multiple of EBITDA is paying for the probability that EBITDA persists. Each layer of the stack above is, ultimately, a piece of evidence that the AI-driven margin and throughput gains in your trailing twelve months are durable rather than accidental. That is the entire argument for multiple expansion from AI — and it sits or falls on whether the stack is real.
Frequently asked questions
- What does a production AI stack look like?
- Five layers: a model gateway (e.g. OpenRouter), models routed per task, observability and logging (e.g. Langfuse), evaluation/quality measurement, and governance aligned to NIST AI RMF or ISO/IEC 42001. The a16z 'emerging architectures' reference is the widely-used baseline.
- What does each AI stack layer signal to a buyer?
- Gateway → vendor diversification (no single-vendor concentration risk). Observability → control (every call logged, attributable, replayable). Evals → verifiability (output quality measured over time). Governance → compliance (aligned to a recognised framework).
- Do I need all five layers?
- For a business going to market in 12+ months, yes — even at small scale. The cost of running a thin version of all five is small; the cost of being missing one in diligence is a discount applied to the multiple.
Want this for your business?
Start with a Diagnose. Two weeks. Written report. Honest fit assessment.
Send an enquiry