What is the difference between training and inference?

Training is how an AI model is built. The vendor feeds it enormous amounts of text and runs months of computation across thousands of specialised chips to produce a finished model. It happens once per model version. Inference is what happens when you use that finished model: you send a prompt, the model processes it and writes an answer. Inference runs every single time anyone uses the AI. A useful analogy: training is writing and printing a reference book; inference is someone opening it to look something up.

Am I training the AI when I use ChatGPT or Claude?

Almost certainly not. When you type into a chat box or call an API, you are running inference on a model that was already trained. Your prompt is an input to that finished model, not a lesson that rewrites it. Whether your conversations are later used to improve future models is a separate question and depends on the tier and the vendor's policy. On paid business and API tiers, the major vendors state by default that they do not train on your inputs. Consumer free tiers can differ, so check the setting.

Why is training so expensive but each answer is cheap?

Training is a single, massive upfront build: thousands of chips running for months, with reported costs for frontier models running into the tens of millions of dollars and beyond. The vendor absorbs that once. Inference is tiny by comparison - one prompt and one answer - so it is billed in fractions of a cent per token. The economics are like a printing press: the press and the first print run cost a fortune, but each additional copy is cheap. You are buying copies, not the press.

Do I ever need to train my own model?

For the overwhelming majority of Australian SMBs, no. Training a model from scratch is a vendor-scale undertaking. What people often mean by training is actually one of three cheaper things: giving the model the right context in the prompt, connecting it to your documents through retrieval, or in some cases fine-tuning - a much lighter adjustment of an existing model. All of those still run on inference. Genuine from-scratch training is rarely the right answer for a business problem, and we would talk you out of it in most cases.

Does inference affect speed and latency?

Yes - speed is an inference question, not a training one. Once a model is trained, how fast it answers depends on the size of the model, how much you send it, how long an answer you ask for, and how busy the provider is. Larger models reason better but generally respond more slowly and cost more per token. For high-volume or time-sensitive workflows, picking a faster, smaller model for the simple steps is often the right call. None of that touches training; it is all about how inference is run.

← All insights

Explainers·7 min read

Training vs inference: what AI actually does when you use it

There are two completely different things going on inside AI, and owners get charged for, and worried about, the wrong one. Training is how a model gets built. Inference is what happens every time you use it. Get the difference and the cost, the data question and the speed question all stop being murky.

Two jobs, not one

When people say “AI”, they are usually picturing a single thing. It is actually two:

Training is the one-off process of building the model. The vendor feeds it an enormous amount of text and runs months of computation across thousands of specialised chips. The output is a finished model - a fixed set of numbers that knows how to predict text. This happens once per model version.
Inference is running that finished model. You send a prompt, the model processes it and writes an answer. This happens every single time anyone uses the AI, millions of times a day across all users.

The cleanest analogy: training is writing and printing a reference book. Inference is someone opening that book to look something up. Writing the book is a huge, one-time effort. Looking things up in it is cheap and happens constantly.

Why an owner should care: you are buying inference

Here is the line that matters. As a business, you are almost never training anything. You are buying inference.

When you pay for a Claude or ChatGPT seat, or call an API from your own software, you are running a model the vendor already trained. You are not paying for the training - the vendor absorbed that, once, and spread it across everyone who uses the model. You are paying to run it, per use.

That single fact clears up three things owners get tangled in, and the rest of this article takes them one at a time:

Cost. Your bill is an inference bill, measured in tokens, not a training bill.
Data. Your prompts are inputs to inference. On paid tiers they are not, by default, used to retrain the model.
Speed. How fast you get an answer is an inference question, decided long after training is done.

How it works in practice: the cost split

Training a frontier model is one of the most expensive things in technology. Reported costs for building a single leading model run into the tens of millions of dollars and beyond, according to Stanford’s AI Index, which tracks these figures year on year. Thousands of chips, months of running, before a single customer prompt is served.

You pay none of that directly. Instead, inference is billed in tiny units - per token, where a token is roughly 0.75 of a word - so one prompt and one answer cost fractions of a cent. The economics are a printing press: the press and the first run cost a fortune, but each extra copy is cheap. You are buying copies, not the press.

This is why “how much does AI cost” has such a different answer for you than for a vendor. Your number is driven by how much inference you run - how many prompts, how long, how often - not by the eye-watering training figures in the headlines.

The data question: your prompts are inference inputs

This is the part that calms a lot of nerves. A common worry is “if I paste our client data into AI, am I training it on our secrets, so it leaks to a competitor later?”

On paid business and API tiers, the answer from the major vendors is no by default. Your prompt is an input to inference - the model reads it, answers, and the interaction is not fed back to retrain the underlying model. Anthropic, for example, states in its commercial terms that it does not train on your inputs and outputs by default on its business and API offerings.

Two honest caveats. Consumer free tiers can differ - some personal plans may use conversations to improve future models unless you opt out, so read the tier you are actually on. And default is not the same as forever; policies and toggles change, so confirm the current terms for your plan.

The takeaway: the training-versus-inference distinction is exactly what makes “is my data safe” answerable. You are running inference, not contributing training data - on the right tier.

When it matters: speed, and “do I need to train?”

Speed and latency are inference questions. Once a model is trained, how fast it answers depends on the model’s size, how much you send it, how long an answer you ask for, and how busy the provider is. Bigger models reason better but generally answer slower and cost more per token. For high-volume or time-sensitive jobs, putting a smaller, faster model on the simple steps is often the right move. None of that touches training.

And the question we get most: do I need to train my own model? For almost every Australian SMB, no. What people usually mean by “train it on our business” is one of three lighter things, all of which still run on inference:

Context in the prompt - giving the model the right background each time.
Retrieval - connecting it to your documents so it fetches the relevant facts per question.
Fine-tuning - a much smaller adjustment of an existing model, not a build from scratch.

Genuine from-scratch training is a vendor-scale undertaking, rarely the right answer to a business problem. In most cases we would talk you out of it.

Honest limits

The line can blur. Fine-tuning and some “continual learning” setups sit between the two. They are far closer to inference in cost and effort than to real training, but they are not nothing.
Default policies still need checking. “Not used for training by default” depends on your tier and the vendor’s current terms. Verify, do not assume.
Inference is not free at scale. Cheap per request, but a high-volume automation still adds up. The point is that it is a running cost you can size, not a training cost you have to fund.

The one-line version

Training builds the model once, at enormous expense, and the vendor does it. Inference runs the finished model on your prompt, every time, and it is what you pay for - per token. You are buying inference, not training. That is why your prompts are inputs rather than training data on paid tiers, why your bill scales with usage not headlines, and why speed is something you tune at inference time.

Frequently asked questions

What is the difference between training and inference?: Training is how an AI model is built. The vendor feeds it enormous amounts of text and runs months of computation across thousands of specialised chips to produce a finished model. It happens once per model version. Inference is what happens when you use that finished model: you send a prompt, the model processes it and writes an answer. Inference runs every single time anyone uses the AI. A useful analogy: training is writing and printing a reference book; inference is someone opening it to look something up.
Am I training the AI when I use ChatGPT or Claude?: Almost certainly not. When you type into a chat box or call an API, you are running inference on a model that was already trained. Your prompt is an input to that finished model, not a lesson that rewrites it. Whether your conversations are later used to improve future models is a separate question and depends on the tier and the vendor's policy. On paid business and API tiers, the major vendors state by default that they do not train on your inputs. Consumer free tiers can differ, so check the setting.
Why is training so expensive but each answer is cheap?: Training is a single, massive upfront build: thousands of chips running for months, with reported costs for frontier models running into the tens of millions of dollars and beyond. The vendor absorbs that once. Inference is tiny by comparison - one prompt and one answer - so it is billed in fractions of a cent per token. The economics are like a printing press: the press and the first print run cost a fortune, but each additional copy is cheap. You are buying copies, not the press.
Do I ever need to train my own model?: For the overwhelming majority of Australian SMBs, no. Training a model from scratch is a vendor-scale undertaking. What people often mean by training is actually one of three cheaper things: giving the model the right context in the prompt, connecting it to your documents through retrieval, or in some cases fine-tuning - a much lighter adjustment of an existing model. All of those still run on inference. Genuine from-scratch training is rarely the right answer for a business problem, and we would talk you out of it in most cases.
Does inference affect speed and latency?: Yes - speed is an inference question, not a training one. Once a model is trained, how fast it answers depends on the size of the model, how much you send it, how long an answer you ask for, and how busy the provider is. Larger models reason better but generally respond more slowly and cost more per token. For high-volume or time-sensitive workflows, picking a faster, smaller model for the simple steps is often the right call. None of that touches training; it is all about how inference is run.

Where this fits

Custom Builds

Bespoke web apps, internal tools and AI products built on Claude and the Anthropic SDK.

See Custom Builds →Book a free 30-min discovery call