← All insights
Explainers·7 min read

What is a token? How AI reads, writes and bills your business

When people say an AI model "reads" your document or "writes" a reply, what it is actually doing is processing tokens. A token is the small chunk of text a model reads and generates - usually part of a word, not a whole one. Get tokens, and the thing that confuses most owners suddenly makes sense: why AI has a meter running, and why some jobs cost more than others.

A token is a chunk of a word

A token is the unit AI uses to handle text. On average, in English, one token is about 0.75 of a word. Flip that around and you get a handy rule: roughly 1.3 tokens per word.

So:

  • 1,000 words is around 1,300 tokens
  • A two-page memo of about 800 words is around 1,000 tokens
  • A 20-page report of about 8,000 words is around 10,000 tokens

Short, common words such as “the” or “and” are often a single token; longer or unusual words get split into several pieces. You do not need to count tokens by hand - just know they exist, because they are what AI reads, writes and bills.

Why an owner should care: tokens are the meter

Here is the part that matters for your wallet. AI is metered, and the meter is measured in tokens. Most AI providers charge per token, and they charge in two directions:

  • Input tokens - everything you send in. Your question, your instructions, any documents you paste, plus the running history of a long chat.
  • Output tokens - everything the model writes back. The answer, the draft, the summary.

That means tokens are cost. A short question with a short answer is cheap. A long document in, followed by a long answer out, is not. Once you see AI as a metered service rather than a flat tool, the rest of the picture falls into place.

How it works in practice: input, output and the price gap

The two directions are not priced the same. Generating new text costs the model more than reading existing text, so output tokens usually cost several times more than input tokens.

Using Claude’s published API rates in 2026 as an illustration:

  • Claude Sonnet 4.6: about US$3 per million input tokens and US$15 per million output tokens. At an exchange rate of 0.65 that is roughly A$4.60 in and A$23 out per million tokens.
  • Claude Opus 4.8: about US$5 per million input tokens and US$25 per million output - roughly A$7.70 in and A$38 out per million tokens.

Two things to take from that. First, a million tokens is a lot of text - around 750,000 words - so per-message costs are usually small fractions of a cent. The bill adds up through volume, not through any single request. Second, output is the expensive direction: with Sonnet, writing a reply costs five times what reading the prompt does, so a tight summary is often cheaper than a full essay off the same input.

To estimate before you commit, multiply your word count by 1.3, or divide your character count by about four. It is a ballpark, not a quote - the real count shifts with punctuation, code and other languages - but close enough to budget with.

When it matters: chat seats hide it, APIs bill it

Here is the catch that trips owners up. If your only experience of AI is a paid chat seat - the monthly subscription where someone types into a chat box - you have never seen a token charge. That is by design.

  • Chat seats wrap all of this in a flat monthly fee. You pay per seat, per month, with fair-use or rate limits rather than a per-message bill. Tokens are still counted in the background, but you never see them. This is the right tool when a person is doing the work by hand.
  • The API is where tokens hit your invoice directly. Wire AI into your own software, an automation or a custom build, and you are billed per input and output token with no flat cap. Send ten thousand documents through overnight and you pay for every token of all of them.

This is the single biggest reason a custom build needs a cost estimate that a chat subscription does not. A team on chat seats is a predictable monthly line. An automation processing thousands of items is a usage line that scales with volume - great when it is doing the work of several staff, but only if you have sized it.

Honest limits

  • Token estimates are rough. The 1.3-per-word rule is an average. Code, tables, spreadsheets, images and non-English text all tokenise differently, sometimes a lot.
  • Tokens are not the whole bill. On a custom build, model usage is often a smaller line than the build and maintenance around it. Tokens tell you the running cost, not the project cost.
  • Cheaper is not always better. A smaller model such as Claude Haiku is great for simple, high-volume jobs. For work that needs careful reasoning, paying more per token for a stronger model can be the cheaper outcome overall, because you redo it less.

What to do about it

  • For people doing hands-on work, buy chat seats and stop thinking about tokens. The flat fee is the point.
  • Before any automation or custom build, get a token-based cost estimate at your expected volume, not just a per-message figure.
  • Where you reuse the same large context again and again - the same policy set, the same product catalogue - ask whether prompt caching applies. It stores repeated input so you do not pay full price to re-process it every time, and can cut that portion of the cost by a large margin.
  • Match the model to the task. Do not run a simple classification job on your most expensive model.

The one-line version

A token is the small chunk of text AI reads and writes - about 0.75 of a word, or roughly 1.3 tokens per word. AI is metered in tokens, billed on input plus output, and output costs the most. Paid chat seats hide this behind a flat fee; APIs bill it directly. So long documents and long answers cost more, chat seats are for people, and any automation needs a token-based estimate before you switch it on.

Frequently asked questions

What is a token in AI?
A token is the unit an AI model uses to read and write text. It is a chunk of a word, not a whole word - on average about 0.75 of a word in English, so a rough rule is 1.3 tokens per word, or 1,000 tokens to about 750 words. Common short words are often one token, while longer or unusual words get split into several. Models count tokens both on the way in (your prompt and any documents) and on the way out (the answer they generate).
How do I estimate how many tokens my text is?
For English, take your word count and multiply by about 1.3, or take your character count and divide by roughly four. So a 1,000-word document is around 1,300 tokens, and a two-page memo of about 800 words is around 1,000 tokens. It is an estimate, not an exact figure, because tokenisation varies with the exact words, punctuation, code and other languages. For budgeting, the estimate is close enough to reason about cost before you commit.
Why does AI charge separately for input and output tokens?
Generating new text is more computationally expensive than reading existing text, so providers price the two differently. Output tokens - the words the model writes back - usually cost several times more than input tokens. With Claude Sonnet 4.6 in 2026, input is about US$3 per million tokens and output is US$15, a five-times gap. The practical lesson: a long answer can cost more than a long prompt, so asking for a tight summary is often cheaper than asking for a full essay.
Do tokens cost money in ChatGPT or Claude chat seats?
Tokens are always being counted, but a paid chat seat hides the per-token cost behind a flat monthly fee, usually with fair-use or rate limits rather than a per-message bill. That is fine for a person typing in a chat box. The place tokens hit your invoice directly is the API, where automations and custom builds are billed per input and output token with no flat cap. So chat seats feel free per message; API usage is metered.
How can I reduce my token costs?
Four levers help. Send less input - do not paste a whole 100-page document when a relevant section answers the question. Ask for shorter output when you do not need length, since output tokens cost the most. Use a smaller, cheaper model such as Claude Haiku for simple tasks instead of always reaching for the largest one. And use prompt caching when you reuse the same large context repeatedly, which can cut the cost of that repeated input by a large margin.

Where this fits

Claude Implementation

Install Claude properly across your team - Claude Code, Claude.ai projects and skills, custom Anthropic SDK builds.