How to Save AI Tokens, Manage Licences, and Keep Your AI Spend Under Control

AI tools are genuinely useful. They are also genuinely expensive if nobody is paying attention.

A team of 20 people using ChatGPT Team, Microsoft Copilot, and the OpenAI API without any governance in place can quietly spend £3,000–£8,000 per month — and get far less value than they should, because most of those tokens are wasted on vague prompts, the wrong models, and duplication.

This article gives you a complete, practical system to stop that waste. Whether you are an individual trying to stay on the free tier, an IT admin managing AI tools for a team, or a developer building applications on top of AI APIs — every section here has something you can act on today.

First: What Is a Token and Why Does It Cost Money?

Before controlling costs, you need to understand what you are actually paying for.

Think of tokens like coins in a taxi meter. The meter starts running the moment you start talking to the AI — every word you type clicks the meter, and every word the AI types back clicks it again. When you run out of coins, you either top up or the journey stops.

More precisely:

A token is approximately 4 characters of text — roughly ¾ of an English word
1,000 tokens ≈ 750 words ≈ about one and a half pages of text
Every request has two token counts: input tokens (what you send) and output tokens (what the AI sends back)
You pay for both

Here is the important insight most people miss: you are paying for every word in the conversation — including the system prompt at the top, the entire chat history, and any documents you paste in. Not just the question you just typed.

That is where most waste hides.

How Much Do Tokens Actually Cost?

Costs vary enormously depending on which model you use. This is one of the most powerful levers you have.

API Pricing (Pay-as-You-Go)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Best for
GPT-4o mini	~$0.15	~$0.60	Simple tasks, high volume
GPT-4o	~$2.50	~$10.00	Complex reasoning, important tasks
Claude Haiku 4	~$0.80	~$4.00	Fast, simple, high volume
Claude Sonnet 4	~$3.00	~$15.00	Balanced quality and cost
Claude Opus 4	~$15.00	~$75.00	Hardest problems only
Gemini 1.5 Flash	~$0.075	~$0.30	Very cheap, large context
Gemini 1.5 Pro	~$1.25	~$5.00	Quality at reasonable cost

The ratio matters: Output tokens cost 3–5× more than input tokens on most models. Shorter, more focused answers save significant money at scale.

Subscription Plans

Plan	Cost	Who it is for	Key limit
ChatGPT Free	£0/month	Occasional personal use	Limited GPT-4o access
ChatGPT Plus	~£20/month per user	Individual power users	Higher limits, all models
ChatGPT Team	~£25/month per user	Teams, shared workspace	Admin controls, no training on data
ChatGPT Enterprise	Custom pricing	Large organisations	SSO, compliance, unlimited
Microsoft Copilot for M365	~£25–30/month per user	Office 365 users	Per-user, admin managed
Anthropic Claude.ai Pro	~£18/month per user	Individual Claude users	Priority access

Subscription vs API — which applies to you?

If you use ChatGPT, Claude.ai, or Copilot through a browser or app, you are on a subscription plan — a flat monthly fee. If you are a developer calling OpenAI, Anthropic, or Google through code, you are on the API and paying per token. The saving strategies differ slightly, but most principles in this guide apply to both.

Part 1: Smart Prompting — Use Fewer Tokens Per Request

This is the highest-impact area. The average person wastes 30–50% of their tokens before the AI even starts answering — through unfocused prompts, unnecessary context, and asking for longer answers than they need.

Technique 1: Be Specific, Not Conversational

Most people write to AI like they are texting a friend. Friendly and natural costs tokens.

Instead of this (47 tokens):

text

Hi, I was wondering if you could maybe help me understand how to write 
a PowerShell script? I need one that does something with Active Directory. 
I am not sure where to start.

Write this (22 tokens):

text

Write a PowerShell script to list all disabled AD user accounts. 
Output: CSV with Name, UPN, LastLogonDate.

Fewer words, better answer, less cost.

Technique 2: Tell the AI How Long to Be

AI models default to thorough — they add context, caveats, and explanations you often do not need. One short instruction at the end of your prompt cuts output tokens dramatically.

Add one of these to the end of your prompts:

Instruction	When to use it
`Be concise. Max 3 sentences.`	Quick factual answers
`Bullet points only. No intro.`	Lists and summaries
`Output the code only. No explanation.`	When you just need the script
`Summary only — 5 bullet points max.`	Summarising documents
`One paragraph. Plain language.`	Explaining concepts to non-technical users

Technique 3: Do Not Paste What You Do Not Need

When summarising a document, people often paste the entire thing — including headers, footers, disclaimers, and boilerplate. Strip all of that before pasting.

If you need to analyse a 40-page PDF, do not paste all 40 pages at once. Paste the relevant sections only. Or paste page by page and ask for a running summary.

Technique 4: Use a Shorter System Prompt

If you are using the API or a custom GPT, your system prompt runs on every single request. A 500-word system prompt that runs 1,000 times costs 500,000 tokens — before a single user has typed a word.

Audit your system prompts ruthlessly. Cut anything that is not doing active work. A system prompt should be instructions, not an essay.

Rule of thumb: Keep system prompts under 200 words unless you have a specific reason to go longer.

Technique 5: Ask for Drafts, Not Finals

When writing something — an email, a policy document, a script — do not ask for a full finished version on the first pass. Ask for an outline or bullet points first. Review it. Only then ask for the full draft.

text

First, give me a bullet-point outline only. I will tell you which sections to expand.

This two-step approach often saves 60–70% of output tokens, because you avoid the AI writing sections you will delete anyway.

Technique 6: Avoid Repeating Context

In a long conversation, people often re-explain the context they already gave. "As I mentioned earlier, I am working with Windows 11..." — the AI remembers everything in the conversation window. Do not repeat what is already there.

Technique 7: Use Code Instead of English for Structured Tasks

If you need data in a specific format, give the AI an example of the format rather than describing it in words.

Wordy (uses more tokens, less reliable output):

text

Can you give me a table with three columns showing the first column as the 
name of the country, the second column as the capital city, and the third 
column as the population? Please format it nicely.

Precise (uses fewer tokens, more reliable output):

text

List 5 countries as JSON: [{"country":"","capital":"","population":0}]

Technique 8: One Question at a Time

Asking three questions in one message often results in the AI answering all three at length — three times the output tokens. Ask one question, get the answer, then ask the next.

Unless you genuinely need all three answers together, this single habit significantly reduces output length.

Part 2: Use the Right Model for Each Task

This is the fastest way to cut costs without changing how you work at all. Most teams default to the most expensive model for everything — and that is pure waste.

Think of it like hiring contractors:

You do not hire a senior architect to hang a picture frame
You do not hire an apprentice to design your building's structure

AI models work the same way.

The Three-Tier Model Strategy

Tier 1 — The Fast and Cheap Model

Use for: simple questions, formatting, short summaries, classification, quick lookups, drafting short emails.

Models: GPT-4o mini, Claude Haiku 4, Gemini 1.5 Flash

Cost: 10–50× cheaper than top-tier models. The quality for simple tasks is nearly identical to the expensive models.

Tier 2 — The Balanced Model

Use for: writing longer documents, reviewing code, analysing reports, multi-step tasks, technical explanations.

Models: GPT-4o, Claude Sonnet 4, Gemini 1.5 Pro

Cost: Mid-range. Good quality at a reasonable price. This should be your default for most professional work.

Tier 3 — The Premium Model

Use sparingly for: complex reasoning, critical decisions, long document analysis, debugging difficult problems, legal or financial review.

Models: Claude Opus 4, o1, o3

Cost: 5–10× more than Tier 2. Only justified when the task genuinely needs it.

Quick Model Selection Guide

Is this a simple, routine task? (format text, short email, basic question)

YES → Use Tier 1 (cheap and fast). Save 80–90% vs using top-tier.

NO → Does it involve complex reasoning, long analysis, or important decisions?

YES → Use Tier 3 sparingly. Document why Tier 2 was not enough.

NO → Use Tier 2. This covers the majority of professional work.

Real Savings Example

Say your team sends 500 requests per day:

Scenario	Model used	Monthly cost (estimate)
All requests on GPT-4o	Tier 2 for everything	~£400/month
Simple tasks on GPT-4o mini, complex on GPT-4o	Smart tiering	~£80/month
Saving	—	~£320/month

The work output quality is nearly identical. The cost is 80% lower.

Not sure which tier your task needs? Start cheap.

Always try the cheapest model first. If the output is not good enough, move up one tier. You will be surprised how often GPT-4o mini or Claude Haiku handles a task just as well as the expensive model.

Part 3: Licence Management — Paying Only for What You Actually Use

Licences are where organisations waste the most money — not through heavy usage, but through sheer inertia. Licences assigned to people who barely use them. Teams on Enterprise plans that do not need Enterprise features.

Audit Who Is Actually Using What

Before your next renewal, pull usage data:

Microsoft Copilot: Admin Centre → Reports → Usage → Copilot activity
ChatGPT Team/Enterprise: Settings → Usage in the admin console
OpenAI API: Platform → Usage dashboard — shows per-API-key consumption

The typical finding: 20–30% of licensed users account for 80%+ of usage. The remaining users log in once a month or less.

Action: Remove or downgrade licences for low-usage users. Reassign them to people who will actually benefit.

Right-Size the Plan for Each User Type

Not everyone needs the same tier. Segment your users honestly:

User type	What they need	Right plan
Occasional user (1–2 questions/day)	Basic AI access	Free tier or shared team account
Regular user (drafting, summarising, Q&A)	Reliable access, good models	Plus or Team
Power user (complex analysis, API, integrations)	High limits, all models, priority	Team or Enterprise
Developer (building apps on the API)	API access, fine control	Pay-as-you-go API

Consolidate Overlapping Tools

Many organisations are unknowingly paying for multiple AI tools that do the same thing:

ChatGPT Plus for some users
Microsoft Copilot for the same users
Gemini Advanced through Google Workspace
An independent AI writing tool

Do a quick audit. If Copilot already covers writing and summarisation for Microsoft 365 users, they may not need a separate ChatGPT Plus subscription. Picking one primary tool for most users and reserving specialist tools for specific roles can cut AI licence spend by 30–50%.

Part 4: Setting Hard Spend Caps and Token Limits

Unlimited usage is a liability. Every major AI platform now gives you tools to set hard limits — use them. Here is how on each platform:

OpenAI API — Spend Limits

Set a Monthly Spend Limit

Log in to platform.openai.com → Settings → Limits. Set a hard limit (usage stops at this amount) and a soft limit (you get an email warning when you approach it).

Start conservative — for a small team just beginning with the API, £50–100/month is a sensible hard limit until you understand your actual usage patterns.

Use Separate API Keys Per Project

In platform.openai.com → API Keys, create a different key for each project or use case. You can monitor spending per key in the Usage dashboard. This tells you exactly which application or team is consuming the most.

Set Per-Key Rate Limits

For each API key, you can set requests-per-minute and tokens-per-minute limits. This prevents a runaway script from consuming your entire monthly budget in an hour.

Microsoft Copilot — Admin Controls

If your organisation uses Microsoft Copilot for Microsoft 365:

Manage Licence Assignment in Entra ID

In the Microsoft 365 Admin Centre → Users → Active Users, assign Copilot licences only to users who will actively use it. Unassign licences from users who have not opened Copilot in 30 days.

Review Usage Reports Monthly

Admin Centre → Reports → Microsoft 365 Usage → Copilot. Review the Active Users and Feature Usage report. Filter to users with fewer than 5 interactions in the last 30 days — those are candidates for licence removal.

Use Microsoft Purview for Data Controls

If your organisation handles sensitive data, enable Copilot sensitivity labels in Microsoft Purview. This prevents Copilot from referencing files above a certain classification level — both a governance control and a token saver (less context being pulled in automatically).

Anthropic API — Usage Controls

In console.anthropic.com → Settings → Limits:

Set a monthly spend limit — usage halts when the limit is reached
Create separate API keys per project with individual rate limits
Monitor per-key usage in the Usage tab to see which projects are most expensive

Azure OpenAI — Quota Management

If you are using Azure OpenAI (common in enterprises):

In the Azure Portal → your Azure OpenAI resource → Quotas, set TPM (Tokens Per Minute) limits per deployment
Create separate deployments for different teams or applications with individual quotas
Set Azure Cost Management budgets with email alerts at 80% and 100% of your monthly target
Use Azure Policy to restrict which resource groups can create OpenAI deployments

Set alerts at 80%, not 100%

A spend alert at 100% tells you when the damage is already done. Set your first alert at 50% of your monthly budget (a sanity check) and a second at 80% (time to investigate). If you hit 80% by mid-month, you have time to act before the bill lands.

Part 5: Monitor Usage and Catch Waste Early

Controls without monitoring are not controls — they are just hopes. Build a simple monthly review habit.

What to Review Each Month

What to check	Where to find it	What to do with it
Total spend vs last month	Platform billing dashboard	If growing, investigate which project or user is driving it
Top 5 highest-spending API keys	OpenAI / Anthropic usage dashboard	Review whether the usage is legitimate and expected
Copilot inactive licences	M365 Admin Centre usage report	Remove licences from users with <5 interactions
Average tokens per request	API usage metrics	Spikes indicate runaway prompts or a broken script
Model distribution	API logs	Check you are not using premium models for simple tasks

Set Up Automated Alerts

Do not rely on remembering to check. Set email alerts so you are notified before a problem becomes a bill:

OpenAI: platform.openai.com → Settings → Limits → Soft limit email notification
Azure: Cost Management → Budgets → add email alert at 80% of target
Anthropic: console.anthropic.com → Settings → Billing → usage alert threshold
Google (Gemini API): Cloud Console → Billing → Budgets and Alerts

The Complete Cost-Control Checklist

Use this as your quarterly AI spend audit:

Prompting habits:

Are users asking focused, specific questions rather than long conversational ones?
Are output length instructions included in reusable prompts?
Are people pasting full documents when only sections are needed?

Model choices:

Is the team defaulting to the most expensive model out of habit?
Are simple tasks being routed to cheap models (Haiku, GPT-4o mini, Flash)?
Are premium models reserved for genuinely complex tasks?

Licences:

Have inactive users been identified and their licences removed?
Are users on the right tier for their actual usage level?
Are you paying for overlapping tools that do the same job?

Spend controls:

Are hard spend limits set on all API accounts?
Are separate API keys used per project?
Are email alerts configured at 80% of your monthly target?

Monitoring:

Is there a monthly usage review in the calendar?
Does someone own the AI cost review — or does it fall through the cracks?

Frequently Asked Questions

How much should a team of 10 realistically spend on AI per month? Highly variable — but as a benchmark, a 10-person team using AI for writing, summarisation, and code assistance with good cost hygiene typically lands at £100–250/month in total. Teams without controls in place often spend 3–5× that for the same output.

Is it worth switching between providers to save money? Sometimes, but do not underestimate the switching cost — learning curves, integration changes, different prompt styles. Better to optimise on one or two platforms than to constantly chase the cheapest per-token rate.

Can I use prompt caching to save on API costs? Yes — both Anthropic and OpenAI support prompt caching, which discounts repeated context (like a long system prompt) on subsequent requests. If you have a fixed system prompt or document that appears in every request, caching can reduce input token costs by up to 90% on those cached portions. Worth implementing for any application that runs at volume.

Does ChatGPT Plus have token limits? ChatGPT Plus subscriptions have usage caps rather than hard token limits — you get a certain number of GPT-4o requests per period (the exact number changes based on demand). When you hit the cap, you are temporarily dropped to GPT-4o mini until the window resets. Monitoring your usage in Settings → Data Controls shows how close you are to the cap.

What is the single biggest waste of tokens that most people do not realise? Pasting entire documents when you only need a section. A 10,000-word PDF pasted into ChatGPT to answer a question about one paragraph costs 10,000 input tokens when you needed roughly 300. Extract the relevant section first — always.

Should I use streaming in my API applications? Streaming (receiving the response token by token as it is generated, rather than waiting for the full reply) does not affect your token cost — you pay the same either way. It improves perceived speed for users but does not reduce the bill.

Conclusion: Control Is a Feature, Not a Restriction

The goal of all of this is not to make AI less useful — it is to make it sustainably useful. Unlimited, unmonitored AI spend creates two problems: bills that grow out of control and a culture where nobody thinks carefully about how they use AI.

The best-run AI programmes treat token budgets like time budgets. You would not let a team book unlimited travel with no approval. The same discipline applies here.

Start with the quick wins:

Add output-length instructions to your most-used prompts today
Set a spend alert on your API accounts this week
Pull your Copilot usage report and identify inactive licences this month

Each of these takes under 15 minutes and can save meaningfully on your next bill. Build from there — model tiering, system prompt audits, and monthly reviews — and you will find that better AI cost management often also means better AI quality. Focused prompts get better answers. Right-sized models respond faster. Less waste in means better signal out.

How to Save AI Tokens, Manage Licences, and Keep Your AI Spend Under Control

First: What Is a Token and Why Does It Cost Money?

How Much Do Tokens Actually Cost?

API Pricing (Pay-as-You-Go)

Subscription Plans

Part 1: Smart Prompting — Use Fewer Tokens Per Request

Technique 1: Be Specific, Not Conversational

Technique 2: Tell the AI How Long to Be

Technique 3: Do Not Paste What You Do Not Need

Technique 4: Use a Shorter System Prompt

Technique 5: Ask for Drafts, Not Finals

Technique 6: Avoid Repeating Context

Technique 7: Use Code Instead of English for Structured Tasks

Technique 8: One Question at a Time

Part 2: Use the Right Model for Each Task

The Three-Tier Model Strategy

Quick Model Selection Guide

Real Savings Example

Part 3: Licence Management — Paying Only for What You Actually Use

Audit Who Is Actually Using What

Right-Size the Plan for Each User Type

Consolidate Overlapping Tools

Part 4: Setting Hard Spend Caps and Token Limits

OpenAI API — Spend Limits

Microsoft Copilot — Admin Controls

Anthropic API — Usage Controls

Azure OpenAI — Quota Management

Part 5: Monitor Usage and Catch Waste Early

What to Review Each Month

Set Up Automated Alerts

The Complete Cost-Control Checklist

Frequently Asked Questions

Conclusion: Control Is a Feature, Not a Restriction

Chetan Yamger

Stay in the loop.
New articles, straight to you.

Discussion

First: What Is a Token and Why Does It Cost Money?

How Much Do Tokens Actually Cost?

API Pricing (Pay-as-You-Go)

Subscription Plans

Part 1: Smart Prompting — Use Fewer Tokens Per Request

Technique 1: Be Specific, Not Conversational

Technique 2: Tell the AI How Long to Be

Technique 3: Do Not Paste What You Do Not Need

Technique 4: Use a Shorter System Prompt

Technique 5: Ask for Drafts, Not Finals

Technique 6: Avoid Repeating Context

Technique 7: Use Code Instead of English for Structured Tasks

Technique 8: One Question at a Time

Part 2: Use the Right Model for Each Task

The Three-Tier Model Strategy

Quick Model Selection Guide

Real Savings Example

Part 3: Licence Management — Paying Only for What You Actually Use

Audit Who Is Actually Using What

Right-Size the Plan for Each User Type

Consolidate Overlapping Tools

Part 4: Setting Hard Spend Caps and Token Limits

OpenAI API — Spend Limits

Microsoft Copilot — Admin Controls

Anthropic API — Usage Controls

Azure OpenAI — Quota Management

Part 5: Monitor Usage and Catch Waste Early

What to Review Each Month

Set Up Automated Alerts

The Complete Cost-Control Checklist

Frequently Asked Questions

Conclusion: Control Is a Feature, Not a Restriction

Chetan Yamger

Stay in the loop.New articles, straight to you.

Discussion

Stay in the loop.
New articles, straight to you.