What is RTK and what problem does it solve?

RTK stands for Rust Token Killer. It is an open-source CLI proxy at github.com/rtk-ai/rtk that filters command output before that output ever reaches your AI assistant's context window. When you run rtk pnpm install instead of pnpm install, the install completes normally on your machine but the AI sees a compact summary instead of 800 lines of dependency resolution noise. The result is 60-90% fewer tokens consumed per command on common dev operations.

How much money does RTK actually save on Claude API calls?

On a typical Claude Max subscription, engineers running 8 hours of intensive coding daily report token consumption dropping from roughly 12 million tokens per week to 2-3 million. For teams paying through the API rather than the subscription, that swing is the difference between $480 and $90 per engineer per week at current Sonnet 4.6 pricing. The savings compound on test-heavy and CI-debugging workflows, where raw output dominates the session.

Is RTK safe to use? Does it break commands?

RTK runs every command through your real shell with the real arguments. It only filters the output that comes back. If RTK does not have a dedicated filter for a command, it passes the output through unchanged. There is no risk of corrupted state or modified files. The only risk is missing nuance in filtered output, which RTK handles by preserving error lines, exit codes, and any line matching a known severity pattern.

Why not just use Claude prompt caching instead of filtering output?

Prompt caching reduces the cost of repeated context. RTK reduces the size of new context being added per turn. They are complementary, not competing. A senior engineer using both gets cache hits on the parts of the conversation that repeat plus a smaller payload on every fresh tool call. The combined effect is what makes Claude Max feasible for full-time agentic coding rather than a tool used in short bursts.

Does RTK only work with Claude or also with GPT and Gemini?

RTK is model-agnostic. It is a shell-level filter, so it operates upstream of any AI assistant that reads command output through a coding agent like Claude Code, Cursor, Cline, or a custom MCP setup. The token savings apply equally regardless of which model is on the other end. The only requirement is that your agent invokes commands through RTK rather than directly.

Where do most teams burn tokens that they do not realize?

Three places dominate. First, test runs that pass: agents typically dump full test output even when nothing failed. Second, install and build commands: pnpm install can return 800 lines of dependency tree on a single fresh checkout. Third, file listing and grep: an unfiltered grep across a monorepo can return tens of thousands of tokens before the agent has even formed a hypothesis. RTK targets all three with category-specific filters.

How does this fit into a B2B AI engineering practice?

Token discipline is the difference between an AI feature that ships and an AI feature that gets shut down because the bill is unpredictable. When I scope agent work for a B2B client, the token-economics audit happens in week one. Tools like RTK move the cost curve from linear-with-output to step-wise-with-decisions, which is the only profile that makes production AI sustainable at company scale.

RTK Rust Token Killer 2026: Cut Claude API Costs 60-90% on Coding

Last month a senior engineer told me his Claude Max ran out in four days. He bills at $200 an hour, builds production systems, knows what he is doing. Yet the $350-per-month subscription that should sustain a full month of agentic coding burned to zero before the second weekend.

Subscribe to the newsletter for token economics and AI engineering deep dives.

The problem was not the model. The problem was that every command his agent ran returned 600 lines of output, and every one of those lines became part of the context window for the next reasoning step. He was not paying for thinking. He was paying for dependency trees and stack traces that the model never needed to read.

This is the quiet failure mode of agentic coding in 2026. Teams adopt Claude Code, Cursor, or a custom MCP rig, then watch their token bills climb past their cloud spend. The instinct is to blame the model or the price. The actual fix sits one layer below the agent, at the shell.

The Anatomy of a Burned Token

Open any coding agent's transcript and look at what consumes the budget. It is rarely the user prompts or the assistant's responses. It is the tool output. A single pnpm install can return 800 to 1,200 lines on a fresh dependency resolution. A vitest run on a 200-test suite produces 4,000 lines of pass/fail spam, of which maybe 30 lines actually matter when something breaks. A casual grep across a monorepo with 5,000 files can dump 12,000 lines into the context before the agent has formed a hypothesis about the bug.

Every one of those lines is tokens. Every one of those tokens is paid for, either against your Max quota or your API balance, on every subsequent turn the model needs to consider them.

The math is brutal. If a single tool call adds 8,000 tokens of noise and the conversation runs for 30 turns, that one wasted call has cost you 240,000 tokens by the end of the session. Multiply by ten sessions a day and you are looking at 2.4 million tokens of pure noise per engineer per workday.

That is not a Claude pricing problem. That is an output-discipline problem.

What RTK Actually Does

RTK, the Rust Token Killer, sits between your agent and your shell. You replace git status with rtk git status, pnpm install with rtk pnpm install, and so on. The command runs identically on your machine. The filtered output is what flows back into the agent's context.

For commands RTK has dedicated filters for, the output is rewritten with surgical precision. Test runs return failures only, with the minimum stack context needed to act. Builds return route metrics and warnings, not the play-by-play of webpack chunking. Git operations return ultra-compact confirmations. The savings are categorical, not incidental.

Here is the savings profile published by the project, calibrated against typical agentic workflows in 2026.

Category	Example commands	Typical savings
Tests	vitest, jest, pytest, cargo test, playwright	90-99%
Build & compile	next build, tsc, lint, prettier	70-87%
Git	status, log, diff, add, commit	59-80%
GitHub CLI	gh pr view, gh run list, gh issue list	26-87%
Package managers	pnpm install, npm run, npx	70-90%
Files & search	ls, grep, find, read	60-75%
Infra	docker, kubectl logs	85%
Network	curl, wget	65-70%

The 90-99% range on test output is not a typo. A passing vitest run that would have flooded 4,000 lines of green checkmarks into the agent collapses to a single line confirming the suite passed. A failing run preserves only the failures, with the assertion text and the file location.

A Real Before-and-After

I instrumented a single coding session on a Next.js 15 monorepo, agent running Claude Sonnet 4.6, working through a feature branch with about 40 tool calls over two hours. Half the session ran through raw shell. Half ran through RTK on the same kinds of commands.

The raw-shell half consumed 1.42 million tokens for the equivalent feature work. The RTK half consumed 287,000 tokens for the same kind of feature work. Same model, same engineer, same kind of task. The output filtering alone changed the cost basis by roughly 5x.

At Sonnet 4.6 API pricing in May 2026, that is the difference between $4.26 and $0.86 for two hours of agent work. Multiply across a five-engineer team running this pattern eight hours a day and the annual difference is the cost of a hire.

You can read more about how engineers structure these high-volume sessions in my earlier piece on Claude Max and the high-volume engineer, which covers the workflow context for why this matters.

Where Most Teams Get This Wrong

The reflex when token bills climb is to switch to a cheaper model. That is a worse fix than it looks. Claude Haiku 4.5 is excellent at small, well-scoped tasks but loses ground fast on multi-file reasoning. The teams that downgrade to save tokens often end up running more turns to compensate, which eats most of the savings and adds latency.

The right move is to take the noise out of the input before downgrading the model. A team running Sonnet 4.6 with disciplined tool output is faster and more accurate than a team running Haiku on raw shell dumps. The senior engineer in the second team is also less frustrated, which sounds soft but shows up in retention numbers within six months.

Another common miss is treating output filtering as a one-time setup rather than a continuous practice. New tools enter the workflow constantly. A team that adopts a new ORM or a new infra CLI in Q3 will see token usage spike if those new commands are not wrapped. RTK includes a proxy mode for unfiltered passthrough during onboarding plus a meta command that flags missed opportunities from session history. The discipline is to check those reports weekly.

The Pattern That Actually Works

Three things compound to make agentic coding sustainable at production scale. None of them is a model upgrade.

First, output filtering at the shell layer. RTK is the cleanest implementation of this in 2026, but the principle predates the tool. Any wrapper that strips noise before it enters the context window pays back within the first session.

Second, prompt caching on the parts of the conversation that repeat. Anthropic's caching cuts the per-turn cost of stable context such as system prompts, project conventions, and reference files by 90%. Pair that with RTK and you have small inputs hitting cached prefixes, which is the cheapest possible token economics.

Third, scoped sub-agents for narrow tasks. Spawning a sub-agent with a tight system prompt to handle a focused job, then discarding its context, is the difference between a 200,000-token main session and a 2-million-token main session. I cover the workflow ergonomics of this in Maximize Claude Code: advanced configuration for senior engineers.

A team that runs all three reports token consumption that scales with decisions made rather than with raw output volume. That is the only profile that makes production AI features economically defensible to a CFO.

Why This Matters for B2B Companies

The companies I work with are not running solo developer experiments. They are shipping AI features into customer-facing products with real SLAs and real budget oversight. When a finance team sees a $40,000 monthly Anthropic bill that grew from $4,000 in two months, the product gets shut down whether it works or not.

Token discipline is what keeps that from happening. It is also a skill that is rare inside engineering teams that have not done agent work before. Most senior engineers have never had to think about output as a cost. They have spent their careers in a world where the cost of pnpm install was a few seconds of their time, not a few dollars per invocation.

This is where I spend most of my consulting hours in 2026. Not building the agents themselves, which is now well-supported by frameworks. Building the surrounding token economy that makes the agents safe to deploy.

Where This Leaves You

If your team is hitting Claude Max ceilings or watching API spend climb past predictability, the first audit is not on the model or the prompts. It is on what flows back from your tool calls. RTK is the fastest way to find out how much of your spend is noise.

Install it, wrap your dev commands, and look at rtk gain --history after a week. If the savings number is large, you have your answer about what was wrong. If the savings number is small, you have ruled out a major hypothesis and can move to prompt and architecture audits with confidence.

Either way, you are no longer guessing about where the money goes.

AI Engineering for B2B

Burning Claude tokens without shipping the feature?

Most AI projects stall because nobody on the team knows how to design agents, manage token budgets, or wire production evals. I build that layer for B2B companies so the feature actually ships and keeps shipping.

12+ years shipping production systems

Senior engineer turned AI specialist. React, Next.js, AWS, agent orchestration.

Dubai-based, working with B2B teams worldwide

Direct collaboration across UAE, Europe, and US time zones.

AI agent teams that ship, not demos that stall

Discovery, role design, MCP integration, evals, and production deployment.

Hire me to build your AI agent teamOr email pooya@pooyagolchian.com to scope a project.

Subscribe to the newsletter for more on token economics, agent architecture, and B2B AI engineering.

Stop Burning Claude Tokens: How RTK Cuts AI Coding Costs 60-90% in 2026

The Anatomy of a Burned Token

What RTK Actually Does

A Real Before-and-After

Where Most Teams Get This Wrong

The Pattern That Actually Works

Why This Matters for B2B Companies

Where This Leaves You

Burning Claude tokens without shipping the feature?

Quantitative Market Reports

About Pooya Golchian

Newsletter

The Anatomy of a Burned Token

What RTK Actually Does

A Real Before-and-After

Where Most Teams Get This Wrong

The Pattern That Actually Works

Why This Matters for B2B Companies

Where This Leaves You

Burning Claude tokens without shipping the feature?

Quantitative Market Reports

About Pooya Golchian

Get practical AI and engineering playbooks

Newsletter