What does a production AI stack look like?

Five layers: a model gateway (e.g. OpenRouter), models routed per task, observability and logging (e.g. Langfuse), evaluation/quality measurement, and governance aligned to NIST AI RMF or ISO/IEC 42001. The a16z 'emerging architectures' reference is the widely-used baseline.

What does each AI stack layer signal to a buyer?

Gateway → vendor diversification (no single-vendor concentration risk). Observability → control (every call logged, attributable, replayable). Evals → verifiability (output quality measured over time). Governance → compliance (aligned to a recognised framework).

Do I need all five layers?

For a business going to market in 12+ months, yes — even at small scale. The cost of running a thin version of all five is small; the cost of being missing one in diligence is a discount applied to the multiple.

← All insights

The Math·10 min read·Updated Apr 2026

The AI stack buyers want to see (and what each layer signals)

There is now a recognisable reference architecture for production AI. It has five layers. Each layer sends a specific signal to a sophisticated acquirer about how grown-up your AI operation actually is.

The reference architecture

a16z's widely-cited write-up on emerging architectures for LLM applications captures the consensus shape of a production AI system [1]. Five layers, bottom to top:

Models — the underlying LLMs (frontier and open-weight) doing the actual inference.
Gateway — a router that sits in front of the models, choosing per request and handling fallback[2].
Orchestration — workflows and agents that compose model calls and tool calls into useful work.
Observability and evals — logging, tracing, and automated quality testing of every workflow[3].
Governance — policy, risk, human-in-the-loop rules, and the audit trail that ties it all together[4].

What each layer signals to a buyer

Models — and why "we use the best model" is the wrong answer

A buyer wants to see a documented model policy: which models are approved for which use cases, how that decision was made, and how it is reviewed. "We use GPT-5 for everything" signals vendor concentration risk. "We use cheap models for classification, frontier models for reasoning, and the choice is logged per request" signals an operating system.

Gateway — and why its absence is now a red flag

The gateway is what makes the model layer swappable. Without one, every workflow is hard-coded to a vendor; with one, vendor risk is bounded and cost is observable. Gartner has been explicit that single-vendor AI strategies are now a diligence flag.

Orchestration — and why it should mostly be workflows

A grown-up AI estate is mostly fixed workflows with a few true agents at the edges. The reverse — agents everywhere, workflows nowhere — signals an organisation that has confused capability with control. We cover the distinction in agentic AI for operations.

Observability — the layer that converts AI from claim to evidence

Tools like Langfuse, LangSmith and the major cloud observability vendors all converge on the same primitives: traces, prompts, outputs, costs, latency, eval scores. A buyer reading these dashboards can answer the only question they actually care about — does this work, reliably, at this cost — without having to take management's word for it.

Governance — the layer that bounds the downside

NIST's AI RMF four-function model — govern, map, measure, manage — is the standard buyers and their advisors are increasingly anchoring to. A two-page policy that maps to those four functions, with named owners and named controls, is enough at lower-middle-market scale.

How this moves the multiple

A buyer paying a multiple of EBITDA is paying for the probability that EBITDA persists. Each layer of the stack above is, ultimately, a piece of evidence that the AI-driven margin and throughput gains in your trailing twelve months are durable rather than accidental. That is the entire argument for multiple expansion from AI — and it sits or falls on whether the stack is real.

Sources

[1]a16z — Emerging architectures for LLM applications — Reference architecture widely used as a baseline for production LLM systems: gateway, retrieval, orchestration, observability and evals.
[2]OpenRouter — Routing, fallback and observability documentation — Vendor reference for the gateway layer of a production AI stack: per-request model selection, automatic fallback, and unified logging.
[3]Langfuse — Open-source LLM observability and evals — Reference implementation of the observability and evaluation layer increasingly expected in any operational AI deployment.
[4]NIST — AI Risk Management Framework — Governance reference for the top of the stack — what 'controlled AI' looks like to a sophisticated buyer.

Citations are to publicly available reports, advisor publications and practitioner research. Inclusion does not imply endorsement of XLev by any cited organisation.

Frequently asked questions

What does a production AI stack look like?: Five layers: a model gateway (e.g. OpenRouter), models routed per task, observability and logging (e.g. Langfuse), evaluation/quality measurement, and governance aligned to NIST AI RMF or ISO/IEC 42001. The a16z 'emerging architectures' reference is the widely-used baseline.
What does each AI stack layer signal to a buyer?: Gateway → vendor diversification (no single-vendor concentration risk). Observability → control (every call logged, attributable, replayable). Evals → verifiability (output quality measured over time). Governance → compliance (aligned to a recognised framework).
Do I need all five layers?: For a business going to market in 12+ months, yes — even at small scale. The cost of running a thin version of all five is small; the cost of being missing one in diligence is a discount applied to the multiple.

Want this for your business?

Start with a Diagnose. Two weeks. Written report. Honest fit assessment.

Send an enquiry