What is fine-tuning in simple terms?

Fine-tuning is taking a general-purpose AI model and training it a bit further on your own examples, so it leans towards the style, format or behaviour you want on a specific task. You show it hundreds of examples of the input and the ideal output, and it adjusts to match that pattern. It does not give the model new knowledge about your business - it changes how it responds, not what it knows. For knowledge, you use RAG instead.

When does fine-tuning actually help a business?

When the task is narrow, repetitive and format-specific, you run it at real volume, and you have a large set of clean examples of the right output. Classic cases are classifying support tickets into your exact categories, extracting fields into a fixed structure, or matching a very particular house writing style consistently. If the task is varied, low-volume, or really about answering from your documents, fine-tuning is the wrong tool.

Should I fine-tune or just write a better prompt?

Try the prompt first, almost always. OpenAI's own guidance is that prompt engineering is the best place to start and is often all you need. A clear prompt with a few examples in it (called few-shot prompting) gets you most of the way on most tasks, costs nothing to change, and gives you a baseline. Only consider fine-tuning once you have a solid prompt you genuinely cannot push further and the task is high-volume enough to justify the effort.

Is fine-tuning the same as training my own AI?

No. Training a model from scratch costs millions and is the work of frontier labs. Fine-tuning starts from an existing foundation model someone else trained and nudges it on your examples - far cheaper, but still a real project. You need curated data, a training and evaluation loop, and somewhere to host or call the resulting model. It also has to be redone when the base model improves, which it does often.

What does fine-tuning cost an SMB?

More in effort than in compute. The compute to run a fine-tuning job is modest. The cost is the work around it: collecting and cleaning hundreds of high-quality examples, building evals to prove the fine-tune is actually better than a good prompt, and maintaining it as models change. For most SMBs that adds up to a custom build's worth of effort for a gain a prompt or RAG would have delivered, which is why we rarely start here.

← All insights

Explainers·8 min read

What is fine-tuning, and does your business actually need it?

Fine-tuning is one of the most over-reached-for ideas in AI. Owners hear it and picture "our own AI, trained on our business" - something proprietary and powerful. The reality is narrower and, for most SMBs, beside the point. Fine-tuning changes how a model behaves on a specific task. It does not teach the model about your business, and it is almost never the cheapest way to get the result you actually want.

Here is the plain-English version: what fine-tuning is, the few cases where it genuinely pays back, and why prompting or RAG usually wins.

Fine-tuning in 30 words

Fine-tuning is further-training an existing foundation model on your own example inputs and outputs, so it leans towards a style, format or behaviour you want on a specific, repetitive task. It shifts behaviour, not knowledge.

What it actually does

Start with a foundation model - a large language model someone like OpenAI or Anthropic has already trained on a huge slice of the internet. Fine-tuning takes that model and trains it a little further on a set of examples you provide: hundreds of pairs of “here is the input, here is the ideal output.” The model adjusts its internal settings to match that pattern more closely.

The key thing to hold onto: fine-tuning teaches a pattern of response, not a set of facts. If you fine-tune a model on your support tickets, it gets better at sounding like your support team and slotting things into your categories. It does not learn the contents of your product manual. Feed it a question whose answer lives in a document, and a fine-tune is no better off than the base model. That job belongs to RAG.

When fine-tuning actually helps

There is a real sweet spot. Fine-tuning earns its cost when all of these are true:

The task is narrow and repetitive. One well-defined job done the same way thousands of times - not a grab-bag of varied requests.
It is format- or style-specific. You need output in a very particular shape or voice, consistently, and prompting gets you close but not reliably there.
You run it at volume. The effort only amortises if the task happens a lot.
You already have clean examples. Hundreds of high-quality input-output pairs you can train on.

Typical good fits: classifying incoming messages into your exact internal categories, extracting fields into a fixed structure at scale, or holding a very specific house style across high-volume drafting. Narrow, repetitive, high-volume, well-exampled.

Why prompting or RAG usually wins

For most of what SMBs want from AI, fine-tuning is the long way round.

OpenAI’s own guidance is blunt about the order: start with prompt engineering, because it is often all you need, and only move to fine-tuning once you have a baseline you cannot beat. A clear prompt with a few worked examples inside it (few-shot prompting) handles a remarkable range of tasks. It costs nothing to change - you edit text, not retrain a model - and you can iterate in an afternoon.

When the real need is “answer using our documents,” the right tool is RAG, not fine-tuning. RAG retrieves the relevant passages from your material and hands them to the model per question, stays current as documents change, and cites its sources. Fine-tuning cannot do any of that, and you would be retraining every time a policy changed.

So the honest decision tree is short:

Can a better prompt get you there? Usually yes. Do that.
Is it really about your documents and facts? Use RAG.
Is it a narrow, high-volume, format-specific task that prompting genuinely cannot nail, and you have the example data? Now fine-tuning is on the table.

The cost most owners underestimate

The compute to run a fine-tuning job is cheap. The cost is everything around it. You need to collect and clean hundreds of high-quality examples, build evals to prove the fine-tune actually beats a good prompt (otherwise you have spent money to go sideways), and then maintain it. That last point matters: base models improve constantly, and a fine-tune is tied to the version it was built on, so it has to be redone to ride the upgrades.

It is also a more deliberate, engineering-led path than people expect. Fine-tuning Anthropic’s Claude, for instance, is offered through Amazon Bedrock on Claude 3 Haiku - not a button inside the everyday Claude apps, but a proper build with data limits and a training pipeline. That is the norm, not the exception.

The owner’s takeaway

Fine-tuning is a real tool with a narrow, legitimate job. It is not “our own AI,” and it is not how you get answers from your documents. For the vast majority of SMB use cases, a sharp prompt or a solid RAG build delivers the same outcome faster, cheaper, and with far less to maintain.

If a vendor’s first move is to fine-tune, ask why a prompt and RAG will not do it. Most of the time, the honest answer is that they will - and you have just saved yourself a project. Reach for fine-tuning when, and only when, you have a narrow high-volume task, the example data to back it, and a baseline you have genuinely exhausted. That is rarely where an SMB should start.

Sources

[1]OpenAI - Optimizing LLM accuracy - OpenAI's guidance that prompt engineering is the place to start, with fine-tuning reserved for once you have a proven baseline and a clear reason to go further.
[2]OpenAI - Fine-tuning best practices - Practical vendor guidance on what a fine-tune needs - a baseline from prompting, curated example pairs, and the instructions that worked best included in every training example.
[3]AWS - Fine-tuning for Anthropic's Claude 3 Haiku in Amazon Bedrock - Confirms Claude fine-tuning is delivered through Amazon Bedrock on Claude 3 Haiku, with limits on training records and context length - context for why it is a deliberate engineering project, not a toggle.

Citations are to publicly available reports, advisor publications and practitioner research. Inclusion does not imply endorsement of XLev by any cited organisation.

Frequently asked questions

What is fine-tuning in simple terms?: Fine-tuning is taking a general-purpose AI model and training it a bit further on your own examples, so it leans towards the style, format or behaviour you want on a specific task. You show it hundreds of examples of the input and the ideal output, and it adjusts to match that pattern. It does not give the model new knowledge about your business - it changes how it responds, not what it knows. For knowledge, you use RAG instead.
When does fine-tuning actually help a business?: When the task is narrow, repetitive and format-specific, you run it at real volume, and you have a large set of clean examples of the right output. Classic cases are classifying support tickets into your exact categories, extracting fields into a fixed structure, or matching a very particular house writing style consistently. If the task is varied, low-volume, or really about answering from your documents, fine-tuning is the wrong tool.
Should I fine-tune or just write a better prompt?: Try the prompt first, almost always. OpenAI's own guidance is that prompt engineering is the best place to start and is often all you need. A clear prompt with a few examples in it (called few-shot prompting) gets you most of the way on most tasks, costs nothing to change, and gives you a baseline. Only consider fine-tuning once you have a solid prompt you genuinely cannot push further and the task is high-volume enough to justify the effort.
Is fine-tuning the same as training my own AI?: No. Training a model from scratch costs millions and is the work of frontier labs. Fine-tuning starts from an existing foundation model someone else trained and nudges it on your examples - far cheaper, but still a real project. You need curated data, a training and evaluation loop, and somewhere to host or call the resulting model. It also has to be redone when the base model improves, which it does often.
What does fine-tuning cost an SMB?: More in effort than in compute. The compute to run a fine-tuning job is modest. The cost is the work around it: collecting and cleaning hundreds of high-quality examples, building evals to prove the fine-tune is actually better than a good prompt, and maintaining it as models change. For most SMBs that adds up to a custom build's worth of effort for a gain a prompt or RAG would have delivered, which is why we rarely start here.

Where this fits

Custom Builds

Bespoke web apps, internal tools and AI products built on Claude and the Anthropic SDK.

See Custom Builds →Book a free 30-min discovery call