Claude Code Skills & Tools That Cut Your Token Bill (Curated)

By LMaoAGI · Updated 2026-06-02

There are hundreds of Claude Code skills, plugins, and tools out there, and most "awesome" lists just pile them all up by category. This one has a single filter: does it lower what you actually pay? Everything below is free, open-source, and earns its place by saving you tokens or money — not by being neat. (Star counts are live from GitHub.)

It's organized around the three levers that move a Claude Code bill, in order. You can't cut what you can't see, so start at the top.

How cutting Claude Code cost actually works

Three levers, and they stack:

See where the money goes — you can't optimize a number you don't have. (full breakdown of how billing works)
Spend less per token — route the easy work to cheaper or local models.
Use fewer tokens — keep context lean so every turn re-sends less.

Most people jump straight to lever 2 or 3 and guess. Measure first, then the rest gets concrete.

1. Measure first — see where your tokens go

ccusage ★ 16.8k (MIT). The standard tool for this. It reads the local session logs Claude Code already writes and turns them into daily, weekly, monthly, and per-session token + cost reports — no account, no setup, runs entirely on your machine. One command:

``bash npx ccusage@latest ``

It also covers Codex, Gemini CLI, Copilot CLI and other agents, so if you run more than one tool you get a combined picture.

codeburn ★ 8.4k (open-source). If you'd rather explore than read a report, codeburn is an interactive terminal dashboard for the same local logs — cost by project, model, and activity, daily burn charts, and cache-hit rates (which quietly drive a lot of your bill). Runs with npx codeburn.

ccusage and codeburn tell you that you spent $40 last week. For a passive, always-on view that breaks it down per repo and per model without a command to remember, Tokipet reads the same logs from your menu bar. Whichever you pick, the point is the same: get the number in front of you before you start cutting. (Anthropic's own cost docs cover the official console view.)

2. Spend less per token — route to cheaper models

The single most common way to overspend is running Opus for work Haiku or a local model could do.

claude-code-router ★ 35.6k (MIT). Sits in front of Claude Code and routes each request to a model you choose — a cheap cloud model for background tasks, a local model for routine edits, a frontier model only when reasoning actually needs it. It supports OpenRouter, DeepSeek, Ollama, Gemini and more.

``bash npm install -g @musistudio/claude-code-router ``

Real-world reports put savings in the 60–80% range for the work that doesn't need a frontier model — but it adds a moving part to your setup and a routing config to maintain, so it pays off most for heavy daily users. (deeper guide on routing trade-offs)

LiteLLM ★ 52.6k (open-source). The heavier-duty option: a gateway that puts one OpenAI-compatible API in front of 100+ providers, with load-balancing, fallbacks, spend caps, and audit logs. Overkill for a solo dev — reach for it when a team needs central control and per-user budgets across providers, not just personal routing.

claude-token-efficient ★ 5.8k (MIT). A single CLAUDE.md you drop in your project that strips Claude's verbose habits — opening pleasantries, restated questions, over-engineered answers — cutting output tokens (the authors benchmark ~63%). Honest caveat the repo states itself: because that file is re-sent as input on every message, it only nets savings when your output volume is high enough to offset it. Great for chatty, high-throughput workflows; neutral for short sessions.

3. Use fewer tokens — keep context lean

Every turn re-sends your whole context, so bloat is charged repeatedly, not once.

Repomix ★ 26.8k (MIT). When you need to feed a whole codebase to a model, Repomix packs it into one AI-friendly file and counts the tokens first. Its --compress flag uses Tree-sitter to keep structure (classes, functions, signatures) and drop implementation detail — roughly 70% fewer tokens while staying understandable.

``bash npx repomix@latest --compress ``

code2prompt ★ 7.5k (MIT). A similar idea in a fast Rust CLI, with .gitignore-aware filtering, Handlebars templates, a token count, and a --price flag that estimates the dollar cost before you send. Use it when you want tight control over exactly which files become context.

Built-in moves that cost nothing. Before reaching for a tool, Claude Code already gives you the cheapest wins:

/context — see exactly what's eating your window, with token counts per item.
/compact at the end of a distinct phase, and /clear when you switch to unrelated work — stale context is billed on every later message.
Hooks to preprocess output — instead of letting Claude read a 10,000-line log, a hook can grep for the relevant lines and hand back hundreds of tokens, not tens of thousands.
Prefer CLI over MCP for big tools — connected MCP servers load their full tool schemas into context at session start; gh, aws, and friends don't.
Right-size your CLAUDE.md — 300–600 tokens is plenty; it's re-sent every turn.

A realistic stack

You don't need all of these. A sane order:

Run ccusage or codeburn (or Tokipet) for a week and find your actual hotspots.
If one model is dominating an easy workload, add claude-code-router — or LiteLLM if a team needs central budgets.
If your sessions are long and chatty, add claude-token-efficient and lean on /compact + /clear.
Reach for Repomix or code2prompt only when you're dumping large codebases into context.

The savings compound, but so does the setup. Add one tool at a time and re-check the number — that's the whole point of starting with measurement.

FAQ

What are Claude Code skills? "Skills" loosely covers the ecosystem of add-ons for Claude Code — Agent Skills, plugins, commands, hooks, MCP servers, and standalone CLI tools. This list focuses on the subset that reduces token usage or cost.

How do I reduce Claude Code token usage? Measure first (ccusage / codeburn), match the model to the task (claude-code-router), keep context lean (/context, /compact, /clear, lean CLAUDE.md), and compress big codebase dumps (Repomix, code2prompt). The biggest single win is usually not running Opus on work a cheaper model handles fine.

Does claude-code-router actually save money? Yes, for work that doesn't need a frontier model — users report 60–80% on routed tasks — at the cost of an extra component and a routing config to maintain. It's worth it for heavy daily use, overkill for occasional use.

What's the difference between claude-code-router and LiteLLM? claude-code-router is purpose-built for one developer pointing Claude Code at different models. LiteLLM is a full gateway for teams that need load-balancing, fallbacks, and per-user spend caps across many providers. Solo: router. Team: LiteLLM.

What's the best way to track Claude Code cost? The Anthropic console shows aggregate spend; ccusage and codeburn break it down from local logs; a tool like Tokipet adds a passive per-repo and per-model view. See the Claude Code cost guide for the full picture.

Are these tools free? Every tool on this list is open-source and free to use. The only thing you pay for is the models themselves — which is the point of the list.

Where can I find a broader list? For the full ecosystem beyond cost (agents, frameworks, IDE integrations), awesome-claude-code ★ 458 is a reputable, human-curated directory.

Cutting your bill starts with seeing it. Tokipet reads your Claude Code logs locally and shows your real spend per repo and model — install it here.