The Receipt Under the Receipt
I use Claude Code every day. For three months — March through May 2026 — I tracked what my usage actually costs to serve using ccusage, a tool that calculates token costs at API rates.
The subscription cost $600. Three months at $200 per month.
The compute underneath it cost $5,883. That is what Anthropic would charge if I paid per token at published API rates. In April alone — my heaviest month — I consumed $4,164 worth of compute for a $200 payment.
That is a 9.8x subsidy. For every dollar I paid, Anthropic spent nearly ten.
I am not complaining. The product is excellent. The model quality is worth the real cost. But $5,883 in compute served for $600 in revenue is not a business model. It is a customer acquisition strategy funded by $15 billion in venture capital.
The question is not whether this changes. The question is whether you have a plan for when it does.
How Much Is AI Subsidized? Are AI Tokens Subsidized Too?
AI is subsidized heavily, and not by accident. I tracked three months of Claude Code token usage at published API rates: $600 paid against $5,883 of compute consumed — a 9.8x subsidy. So yes, the tokens you consume are subsidized: for every dollar I paid, the provider spent nearly ten on compute. The pattern is industry-wide. Microsoft lost over $20 per user each month on GitHub Copilot in early 2023. Flat-rate AI pricing is funded by venture capital, not unit economics. As of Q2 2026.
The subsidy is easiest to see when you put the price you pay next to the compute it takes to serve you:
| Subscription | You pay / month | Compute cost to serve / month | Effective subsidy |
|---|---|---|---|
| Claude Code — my heaviest month (April 2026) | $200 | $4,164 | 20.8x |
| Claude Code — three-month average | $200 | ~$1,961 | 9.8x |
| GitHub Copilot — Microsoft, early 2023 | $10 | up to $80 | up to 8x |
The subsidy scales with how hard you use the product. The heavier the user, the wider the gap — which is exactly the user a metered pricing model comes for first.
The Correction Is Already Happening
This is not speculation. The pricing shifts started before I finished my analysis.
In April 2026, Anthropic restructured its enterprise pricing from bundled tokens to usage-based billing — a shift reported by The Register that licensing analysts estimated would double or triple costs for heavy users. GitHub announced Copilot moves to usage-based pricing in June 2026. The pattern is the same everywhere: flat-rate subscriptions that made AI feel free are being replaced by metering that reflects actual compute costs.
Agentic AI usage consumes far more tokens than the chatbot interactions subscriptions were designed for. According to Wall Street Journal reporting (via The Register), Microsoft was losing over $20 per user per month on GitHub Copilot in early 2023, with some heavy users costing as much as $80 per month on a $10 subscription. As of Q2 2026, every major AI company is moving to usage-based billing — not a price increase, but the removal of a subsidy that was never meant to last.
The Spreadsheet Moment
I did not predict this. I read pricing pages and did division.
Claude Opus 4.6 — the model I use most — costs $5 per million input tokens and $25 per million output tokens at API rates. Qwen 3.6 Plus, an open-weight model you can own outright from Alibaba available through OpenRouter, costs $0.30 per million input tokens and $1.20 per million output tokens.
That is a 17x difference on input. A 21x difference on output.
| Model | Input ($/M tokens) | Output ($/M tokens) | Output cost vs Opus |
|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 | baseline |
| Qwen 3.6 Plus (open-weight) | $0.30 | $1.20 | 21x cheaper |
The quality gap exists, but it is not 17x. On SWE-bench — the industry benchmark for code generation — models in the $0.25–$1.20 range score within 5–10 percentage points of models in the $5–$15 range. For many tasks — writing a bash script, modifying an existing file, running a verification check — the cheap model produces the same result.
Not all tasks need the strongest model. Benchmarking 13 models across execution, review, planning, and triage, Qwen 3.6 Plus won overall at $0.30 per million input tokens — beating models costing 50x more. Use premium models only where quality differences matter; route everything else to models where price reflects task complexity. As of Q2 2026.
What I Built
I built a system I call fleet. Claude stays as the orchestrator — it reads the codebase, makes architectural decisions, and coordinates the work. But when it needs to dispatch a task — write a script, run a review, verify a change — it routes that task to the cheapest model that can handle it.
The architecture is simple. A bridge server sits between Claude Code and external models. When a task comes in, the orchestrator checks a configuration file that maps task types to models. Planning goes to Claude Opus. Execution goes to Qwen 3.6 Plus. Triage goes to Gemini Flash. Each model does what it is best at, at its own price point.
In the first six days of May, fleet processed 160 million tokens through OpenRouter at a total cost of $44. The same work through Claude Sonnet would cost about $500. Through Claude Opus, about $840.
The costs keep dropping. On May 5 I enabled prompt caching on Qwen dispatches. Cache hit rate went from 18% to 75% in one day, cutting effective token cost by another 54%. The same infrastructure that makes Claude Code fast — caching repeated context across turns — works for cheap models too. You just have to build it.
I run fleet across five projects now. The AI reception system I deploy to clinics, a warranty management platform, a chat widget, automation tools, and fleet itself. The total cost across all projects: $70 for 458 dispatched tasks.
Fleet processed 160 million tokens across five real projects in six days at a total cost of $44. The same work through Claude Sonnet would cost about $500; through Claude Opus, about $840. Enabling prompt caching cut effective token cost a further 54% within one day — cache hit rate jumped from 18% to 75%. As of Q2 2026.
The 87% That Became 100%
I did not ship a working system on the first try. The early development phase — soak testing, benchmarking, debugging — ran 1,019 dispatches with an 87% success rate. Thirteen percent of tasks failed or were killed.
Some failures were revealing. I discovered that angle brackets in prompts — like <city> or <slug> — cause certain models to hang silently. Not error, not crash. Infinite silence. I ran 13 controlled experiments to isolate the variable. The root cause was not the model, not the prompt length, not the MCP server configuration. It was the angle brackets. Every test with angle brackets hung. Every test without them completed.
Other failures were mundane. A bash script referenced a variable before it was assigned. A cleanup function tried to remove a directory that did not exist yet because the trap fired before the assignment line. The kind of bugs that any developer fixes in five minutes — but that a dispatched agent cannot recover from without a human looking at the log.
I fixed each one. Added preflight checks. Built silence watchdogs that kill stuck tasks after 90 seconds. Removed angle brackets from every dispatch prompt. Added structured output validation.
The current system — the one running across five real projects — has completed 458 dispatches with a 100% completion rate. Zero failures. Not because the system is perfect, but because I found the failure modes during development and built guards for each one.
The most counterintuitive failure: angle brackets in prompts caused certain models to hang silently — no error, no crash, just infinite silence. Thirteen controlled experiments to isolate one character. Other failures were mundane bash bugs. The path from 87% to 100% completion rate took two weeks of daily debugging. Every automation system has a messy middle; the question is whether you stay long enough to get through it.
The arc from 87% to 100% matters more than the 100% alone. Every automation system has a messy middle. The question is whether you stay in it long enough to get through it, or abandon the project when it does not work on the first try. I stayed, and it took about two weeks of daily debugging. That is a realistic timeline for any serious automation project — similar to what we see when working through our AI implementation lessons.
What This Means If You Automate With AI
You do not need to build a multi-model dispatch system. Most businesses do not need that level of infrastructure. But you need to understand three things about the AI you rely on.
Know what your AI actually costs. Not your subscription price — the compute cost underneath it. If you are on a $20 or $200 plan and using AI heavily, you are almost certainly consuming more in compute than you pay. That is fine today. It will not be fine forever.
Have a plan for when pricing changes. Usage-based billing is coming to every major AI platform. When your flat-rate $200 becomes metered at API rates, your bill could increase 5–10x for the same usage. The businesses that plan for this will switch models, optimize their pipelines, or negotiate enterprise agreements. The businesses that do not will be surprised by their next invoice.
Build cost tracking into AI automation from day one. Know what each call costs, how many calls each workflow makes, and your monthly burn at API rates versus subscription rates. The gap is your exposure. Design systems to swap models when pricing shifts — for many workloads, cheaper alternatives perform comparably. As of Q2 2026.
The businesses that treat AI cost as a rounding error today will face two surprises at once: the bill, and the realization that their workflows were never designed to be cost-efficient. Both problems are solvable — but they are easier to solve before metering arrives than after.
I built this while the subsidy was still paying for my development time. That window exists right now. It will not stay open.
That is not vision. It is arithmetic.
Mind Momentum builds AI automation systems where cost transparency is part of the design, not an afterthought. If you are evaluating how AI fits into your operations and want to understand the real numbers, get in touch.
