The economics of AI: tokens, cost, and when it's worth it

Every time someone tells me AI is "too expensive" or "basically free," I've learned to ask the same question back: priced against what? Tokens are cheap. Human attention is not. The whole game of getting value out of AI is matching the two up correctly — and it's surprisingly easy to get the matching wrong in both directions.

I manage a data team, not a research lab, so my interest in AI economics is blunt: where does a dollar of spend buy more than a dollar of saved time, and where does it quietly drain the budget while feeling productive? To answer that you first have to understand what you're actually paying for. So let's start with the unit.

What a token is, and why you pay twice

Models don't read words — they read tokens, which are chunks of text. A token is roughly four characters of English, so a token is often a short word or part of a longer one. A paragraph like this one is a few hundred tokens. You can think of a token as the model's smallest billable heartbeat.

The part people miss is that you pay for tokens twice, on two separate meters. There's the input — everything you send in: your prompt, the instructions, the documents you pasted, the conversation history dragged along behind you. And there's the output — everything the model writes back. The two are usually priced differently, and output is typically the more expensive of the two, because generating text is the harder work. A short question that produces a long, detailed answer can cost more on the output meter than the question ever cost on the input one.

Four dials set the bill. Three of them are under your control.

The other two dials: model tier and how many times you run

Token count is only half the story. The second dial is the model tier. Providers offer a ladder — small and fast at the bottom, a frontier model at the top — and the price per token between rungs can differ by an order of magnitude or more. A bigger model is smarter, but you're renting that intelligence by the token whether the task needs it or not. Using a frontier model to reformat a list is like couriering a postcard by chartered jet.

The third dial is the one that ambushes people: how many times the work runs. A single chat message is one round trip. But an agent — a model that calls tools, reads results, and decides what to do next — loops. Each loop re-sends the growing context as fresh input, generates more output, and does it again. Add automatic retries when something fails, and a task you imagined as "one request" can quietly become forty. The token count per step looks tiny; the multiplied total does not.

The scary line on an AI bill is almost never the price per token. It's the number of times you paid it without noticing.

An illustrative bill (numbers invented to show the shape)

Real prices change constantly and vary by provider, so treat everything below as a worked example, not a quote. Imagine a small model at $0.20 per million input tokens and a frontier model at $5 — a deliberately rounded, illustrative gap. Now look at the same 50,000-token task three ways:

~$0.01small model · one pass

~$0.25frontier model · one pass

~$3+frontier · agent loop, ~12 runs

None of those figures are real, and that's the point — the ratios are what matter. The same underlying task spans more than two orders of magnitude depending purely on the model tier you pick and how many times the loop fires. Get those two choices right and cost is a rounding error. Get them wrong on a job that runs thousands of times a day and you've built an expensive habit.

The framing I actually use AI is cheap relative to the human time it replaces — but only when it's pointed at the right task and verified efficiently. A few dollars of tokens to save an analyst two hours is a trade I take every time. A few dollars to generate something nobody needed, that someone then spends two hours checking, is a loss dressed up as innovation.

So when is it worth it?

I run a quick mental sum before automating anything with AI. It isn't precise and it doesn't need to be — the answer is usually obvious once you write the terms down.

Estimate the human time saved. How long does this task take a person today, and how often does it happen? That's the prize, in hours, and hours convert to money.
Estimate the token spend. Roughly how big is the input, how long the output, which tier, how many runs? Multiply. Be honest about agent loops.
Add the hidden costs. Verification time, the cost of acting on a wrong answer, and the engineering to keep it running. These are real and people forget them.
Compare, then decide the tier. If saved time dwarfs the spend, ship it — and start on the cheapest model that's good enough, not the biggest.

Step three is where good intentions go to die. The token bill is visible and small; the verification bill is invisible and often large. If a human has to carefully re-read every AI output to trust it, you may have moved the work rather than removed it. The wins that hold up are the ones where verification is cheap — because the task is low-stakes, or because the output is easy to spot-check, or because being wrong occasionally simply doesn't hurt.

The practical levers

Once you've decided a task is worth doing, a handful of levers keep the cost sane without you ever needing a finance meeting about it:

Right-size the model. Default to a smaller tier and only climb when quality genuinely demands it. Most tasks don't need the frontier.
Keep context tight. Every token you send is a token you pay for, on every loop. Stop pasting whole documents when a relevant paragraph will do.
Cache and reuse. Many providers let you reuse a fixed chunk of context cheaply. If the same instructions ride along on every call, don't pay full price for them each time.
Batch the patient work. Jobs that don't need an instant answer can often run at a discount. Overnight is cheaper than on-demand.
Cap the loops. Put a hard ceiling on agent steps and retries. A runaway loop is the single most expensive failure mode I've seen.
Measure value, not just spend. Track hours saved beside tokens burned. A dashboard of cost with no view of value will only ever tell you to do less.

For the mechanics of pricing — how input, output and cached tokens are billed, and how tiers differ — the cleanest source is whichever provider you use; Anthropic publishes its model pricing openly, and the patterns there generalise well across the market.

The honest summary is that AI economics isn't really about AI. It's the same question every operations decision asks: is this dollar buying me more than a dollar of something I value? Tokens just make the dollar small enough that people stop asking. The teams getting real leverage are the ones who kept asking — pointing cheap, fast models at high-frequency drudgery, reserving the expensive ones for the genuinely hard calls, and watching the loop count like a hawk. That's not a hype strategy. It's just good management, applied to a new line item.