Claude Code Skills & Tools That Cut Your Token Bill (Curated)
There are hundreds of Claude Code skills, plugins, and tools out there, and most "awesome" lists just pile them all up by category. This one has a single filter: does it lower what you actually pay? Everything below is free, open-source, and earns its place by saving you tokens or money — not by being neat. (Star counts are live from GitHub.)
It's organized around the three levers that move a Claude Code bill, in order. You can't cut what you can't see, so start at the top.
How cutting Claude Code cost actually works
Three levers, and they stack:
- See where the money goes — you can't optimize a number you don't have. (full breakdown of how billing works)
- Spend less per token — route the easy work to cheaper or local models.
- Use fewer tokens — keep context lean so every turn re-sends less.
Most people jump straight to lever 2 or 3 and guess. Measure first, then the rest gets concrete.
1. Measure first — see where your tokens go
ccusage ★ 16.1k (MIT). The standard tool for this. It reads the local session logs Claude Code already writes and turns them into daily, weekly, monthly, and per-session token + cost reports — no account, no setup, runs entirely on your machine. One command:
``bash npx ccusage@latest ``
It also covers Codex, Gemini CLI, Copilot CLI and other agents, so if you run more than one tool you get a combined picture.
codeburn ★ 8k (open-source). If you'd rather explore than read a report, codeburn is an interactive terminal dashboard for the same local logs — cost by project, model, and activity, daily burn charts, and cache-hit rates (which quietly drive a lot of your bill). Runs with npx codeburn.
ccusage and codeburn tell you that you spent $40 last week. For a passive, always-on view that breaks it down per repo and per model without a command to remember, Tokipet reads the same logs from your menu bar. Whichever you pick, the point is the same: get the number in front of you before you start cutting. (Anthropic's own cost docs cover the official console view.)
2. Spend less per token — route to cheaper models
The single most common way to overspend is running Opus for work Haiku or a local model could do.
claude-code-router ★ 35k (MIT). Sits in front of Claude Code and routes each request to a model you choose — a cheap cloud model for background tasks, a local model for routine edits, a frontier model only when reasoning actually needs it. It supports OpenRouter, DeepSeek, Ollama, Gemini and more.
``bash npm install -g @musistudio/claude-code-router ``
Real-world reports put savings in the 60–80% range for the work that doesn't need a frontier model — but it adds a moving part to your setup and a routing config to maintain, so it pays off most for heavy daily users. (deeper guide on routing trade-offs)
LiteLLM ★ 50.3k (open-source). The heavier-duty option: a gateway that puts one OpenAI-compatible API in front of 100+ providers, with load-balancing, fallbacks, spend caps, and audit logs. Overkill for a solo dev — reach for it when a team needs central control and per-user budgets across providers, not just personal routing.
claude-token-efficient ★ 5.6k (MIT). A single CLAUDE.md you drop in your project that strips Claude's verbose habits — opening pleasantries, restated questions, over-engineered answers — cutting output tokens (the authors benchmark ~63%). Honest caveat the repo states itself: because that file is re-sent as input on every message, it only nets savings when your output volume is high enough to offset it. Great for chatty, high-throughput workflows; neutral for short sessions.
3. Use fewer tokens — keep context lean
Every turn re-sends your whole context, so bloat is charged repeatedly, not once.
Repomix ★ 26.2k (MIT). When you need to feed a whole codebase to a model, Repomix packs it into one AI-friendly file and counts the tokens first. Its --compress flag uses Tree-sitter to keep structure (classes, functions, signatures) and drop implementation detail — roughly 70% fewer tokens while staying understandable.
``bash npx repomix@latest --compress ``
code2prompt ★ 7.4k (MIT). A similar idea in a fast Rust CLI, with .gitignore-aware filtering, Handlebars templates, a token count, and a --price flag that estimates the dollar cost before you send. Use it when you want tight control over exactly which files become context.
Built-in moves that cost nothing. Before reaching for a tool, Claude Code already gives you the cheapest wins:
/context— see exactly what's eating your window, with token counts per item./compactat the end of a distinct phase, and/clearwhen you switch to unrelated work — stale context is billed on every later message.- Hooks to preprocess output — instead of letting Claude read a 10,000-line log, a hook can grep for the relevant lines and hand back hundreds of tokens, not tens of thousands.
- Prefer CLI over MCP for big tools — connected MCP servers load their full tool schemas into context at session start;
gh,aws, and friends don't. - Right-size your
CLAUDE.md— 300–600 tokens is plenty; it's re-sent every turn.
A realistic stack
You don't need all of these. A sane order:
- Run ccusage or codeburn (or Tokipet) for a week and find your actual hotspots.
- If one model is dominating an easy workload, add claude-code-router — or LiteLLM if a team needs central budgets.
- If your sessions are long and chatty, add claude-token-efficient and lean on
/compact+/clear. - Reach for Repomix or code2prompt only when you're dumping large codebases into context.
The savings compound, but so does the setup. Add one tool at a time and re-check the number — that's the whole point of starting with measurement.
FAQ
What are Claude Code skills? "Skills" loosely covers the ecosystem of add-ons for Claude Code — Agent Skills, plugins, commands, hooks, MCP servers, and standalone CLI tools. This list focuses on the subset that reduces token usage or cost.
How do I reduce Claude Code token usage? Measure first (ccusage / codeburn), match the model to the task (claude-code-router), keep context lean (/context, /compact, /clear, lean CLAUDE.md), and compress big codebase dumps (Repomix, code2prompt). The biggest single win is usually not running Opus on work a cheaper model handles fine.
Does claude-code-router actually save money? Yes, for work that doesn't need a frontier model — users report 60–80% on routed tasks — at the cost of an extra component and a routing config to maintain. It's worth it for heavy daily use, overkill for occasional use.
What's the difference between claude-code-router and LiteLLM? claude-code-router is purpose-built for one developer pointing Claude Code at different models. LiteLLM is a full gateway for teams that need load-balancing, fallbacks, and per-user spend caps across many providers. Solo: router. Team: LiteLLM.
What's the best way to track Claude Code cost? The Anthropic console shows aggregate spend; ccusage and codeburn break it down from local logs; a tool like Tokipet adds a passive per-repo and per-model view. See the Claude Code cost guide for the full picture.
Are these tools free? Every tool on this list is open-source and free to use. The only thing you pay for is the models themselves — which is the point of the list.
Where can I find a broader list? For the full ecosystem beyond cost (agents, frameworks, IDE integrations), awesome-claude-code ★ 423 is a reputable, human-curated directory.
Cutting your bill starts with seeing it. Tokipet reads your Claude Code logs locally and shows your real spend per repo and model — install it here.