How to Reduce Claude Code Token Usage (9 Tactics That Work)

By LMaoAGI · Updated 2026-05-20

Claude Code bills add up quietly because of one fact most people forget: every turn re-sends the whole conversation. Your prompts, Claude's replies, and every tool result get re-processed on each message — so the 50th turn in a session is far more expensive than the 1st, even if you typed the same thing.

That's the lever. Almost everything below is about sending fewer tokens, fewer times. Ordered roughly by impact.

1. Match the model to the task

The single most common way to overspend is running Opus on work a cheaper model handles fine. Output tokens cost ~5x input across the board, and the tiers are far apart (as of 2026, per million tokens):

Model	Input	Output
Haiku	~$1	~$5
Sonnet	~$3	~$15
Opus	~$5	~$25

Routine edits, boilerplate, and refactors rarely need the frontier model. Reserve Opus for genuinely hard reasoning. This one habit often halves a bill. (More on picking a plan in Pro vs Max vs API.)

2. Start fresh — `/clear` and `/compact`

Stale context is billed on every later message, so don't let one session carry work it's done with.

/clear when you switch to unrelated work — start a clean context instead of dragging the old one along.
/compact at the end of a distinct phase (not when you notice slowdown — by then you've already paid). It summarises the session so far into a much smaller context.

3. See what's in your context with `/context`

/context lists everything occupying your window with per-item token counts. It's the fastest way to spot the 8,000-token file you forgot was loaded, or an MCP server eating thousands just by being connected. You can't trim what you can't see.

4. Right-size your `CLAUDE.md`

Your CLAUDE.md is re-sent on every single turn, so bloat there is the most expensive bloat there is. A well-structured one for a real project is usually 300–600 tokens. Split it per-package in a monorepo rather than one giant file. A claude-token-efficient-style rules file can also trim Claude's verbose output — covered in the skills roundup.

5. Prefer CLI tools over MCP servers

Connected MCP servers load their full tool schemas into context at the start of every session — sometimes thousands of tokens just sitting there before you've done anything. CLI tools like gh, aws, and gcloud don't carry that per-session overhead. Use MCP where it genuinely earns its keep; reach for the CLI for the rest.

6. Preprocess big outputs with hooks

Instead of letting Claude read a 10,000-line log file into context, a hook can grep for the relevant lines first and hand back a few hundred tokens. The same idea applies to test output, build logs, and large JSON: filter before Claude sees it, not after.

7. Be surgical with file reads

Pulling a 500-line file in when you need 20 lines wastes hundreds of tokens instantly — and again on every subsequent turn that keeps it in context. Point Claude at the specific function or line range; close files you're done with.

8. Compress codebase dumps

When you do need to feed a whole repo to a model, don't paste it raw. Tools like Repomix (--compress) and code2prompt keep the structure and drop implementation detail — roughly 70% fewer tokens — and count the cost before you send. Both are in the skills roundup. For the work that doesn't need a frontier model at all, route it to a cheaper backend with a claude-code-router.

9. Measure, so you know what actually worked

Every tactic above is a guess until you check the number. Run ccusage for a baseline, or use Tokipet for a passive per-repo, per-model view from your menu bar. Cut one thing, look at the next week, keep what moved the bill. The full cost guide covers how billing works end to end.

FAQ

Why is Claude Code using so many tokens? Because context is re-sent every turn. Long sessions, large files left in context, big CLAUDE.md files, and connected MCP servers all get re-billed on each message — not once.

Does /compact save tokens? Yes. It replaces a long session history with a shorter summary, so subsequent turns re-send far less. Use it at phase boundaries rather than waiting for the context to fill.

What's the biggest single way to cut Claude Code cost? Usually model choice — not running Opus on work Sonnet or Haiku does fine. After that, keeping context lean with /clear and /compact.

Do MCP servers cost tokens even when idle? Effectively yes — their tool schemas load into context at session start, so they consume tokens just by being connected. Disable the ones you're not using.

You can't cut what you can't see. Tokipet reads your Claude Code logs locally and shows real spend per repo and model, so you know which of these tactics actually worked — install it here.