Your AI Subscription Is Subsidized. Here Is What I Did About It.

The Receipt Under the Receipt

I use Claude Code every day. For three months — March through May 2026 — I tracked what my usage actually costs to serve using ccusage, a tool that calculates token costs at API rates.

The subscription cost $600. Three months at $200 per month.

The compute underneath it cost $5,883. That is what Anthropic would charge if I paid per token at published API rates. In April alone — my heaviest month — I consumed $4,164 worth of compute for a $200 payment.

That is a 9.8x subsidy. For every dollar I paid, Anthropic spent nearly ten.

I am not complaining. The product is excellent. The model quality is worth the real cost. But $5,883 in compute served for $600 in revenue is not a business model. It is a customer acquisition strategy funded by $15 billion in venture capital.

The question is not whether this changes. The question is whether you have a plan for when it does.

The Correction Is Already Happening

This is not speculation. The pricing shifts started before I finished my analysis.

In April 2026, Anthropic restructured its enterprise pricing from bundled tokens to usage-based billing. GitHub announced that Copilot moves to usage-based pricing in June 2026. The pattern is the same everywhere: flat-rate subscriptions that made AI feel free are being replaced by metering that reflects actual compute costs.

The reason is straightforward. Agentic AI usage — coding assistants, automation pipelines, multi-step workflows — consumes far more tokens than the chatbot interactions these subscriptions were designed for. A single coding session can burn through what a casual user consumes in a week. The business models built for chat are collapsing under the weight of agents running for hours. Microsoft reportedly lost over $20 per user per month on GitHub Copilot. Anthropic's power users generate compute costs reaching $90,000 annually against $2,400 in subscription revenue. Every major AI company is losing money on their most active customers, and they all know it. The shift to usage-based billing is not a price increase — it is the removal of a subsidy that was never meant to last. The companies that built their workflows around unlimited flat-rate AI access are the ones most exposed when metering arrives.

The Spreadsheet Moment

I did not predict this. I read pricing pages and did division.

Claude Opus 4.6 — the model I use most — costs $5 per million input tokens and $25 per million output tokens at API rates. Qwen 3.6 Plus, an open-weight model from Alibaba available through OpenRouter, costs $0.30 per million input tokens and $1.20 per million output tokens.

That is a 17x difference on input. A 21x difference on output.

The quality gap exists, but it is not 17x. On SWE-bench — the industry benchmark for code generation — models in the $0.25–$1.20 range score within 5–10 percentage points of models in the $5–$15 range. For many tasks — writing a bash script, modifying an existing file, running a verification check — the cheap model produces the same result.

Not all tasks are equal. Planning and architecture decisions benefit from the strongest model available — getting a plan wrong wastes every execution that follows. But mechanical execution, code review, and verification checks do not need the most expensive model. They need a correct-enough model that runs fast and cheap. I benchmarked 13 models across four disciplines — execution, review, planning, and triage — with a scoring system that weights quality, speed, and cost differently per discipline. Qwen 3.6 Plus won the overall championship with 21 points at $0.30 per million input tokens. It beat models costing 50x more. GPT-5.5, despite scoring top-3 on quality in every discipline, earned zero championship points because its $5 per million input price destroyed its value score. The answer is not to replace premium models entirely. It is to use them only where quality differences actually matter — and route everything else to models where the price reflects the task complexity.

What I Built

I built a system I call fleet. Claude stays as the orchestrator — it reads the codebase, makes architectural decisions, and coordinates the work. But when it needs to dispatch a task — write a script, run a review, verify a change — it routes that task to the cheapest model that can handle it.

The architecture is simple. A bridge server sits between Claude Code and external models. When a task comes in, the orchestrator checks a configuration file that maps task types to models. Planning goes to Claude Opus. Execution goes to Qwen 3.6 Plus. Triage goes to Gemini Flash. Each model does what it is best at, at its own price point.

In the first six days of May, fleet processed 160 million tokens through OpenRouter at a total cost of $44. The same work through Claude Sonnet would cost about $500. Through Claude Opus, about $840.

The costs keep dropping. On May 5 I enabled prompt caching on Qwen dispatches. Cache hit rate went from 18% to 75% in one day, cutting effective token cost by another 54%. The same infrastructure that makes Claude Code fast — caching repeated context across turns — works for cheap models too. You just have to build it.

I run fleet across five projects now. The AI reception system I deploy to clinics, a warranty management platform, a chat widget, automation tools, and fleet itself. The total cost across all projects: $70 for 458 dispatched tasks.

The 87% That Became 100%

I did not ship a working system on the first try. The early development phase — soak testing, benchmarking, debugging — ran 1,019 dispatches with an 87% success rate. Thirteen percent of tasks failed or were killed.

Some failures were revealing. I discovered that angle brackets in prompts — like <city> or <slug> — cause certain models to hang silently. Not error, not crash. Infinite silence. I ran 13 controlled experiments to isolate the variable. The root cause was not the model, not the prompt length, not the MCP server configuration. It was the angle brackets. Every test with angle brackets hung. Every test without them completed.

Other failures were mundane. A bash script referenced a variable before it was assigned. A cleanup function tried to remove a directory that did not exist yet because the trap fired before the assignment line. The kind of bugs that any developer fixes in five minutes — but that a dispatched agent cannot recover from without a human looking at the log.

I fixed each one. Added preflight checks. Built silence watchdogs that kill stuck tasks after 90 seconds. Removed angle brackets from every dispatch prompt. Added structured output validation.

The current system — the one running across five real projects — has completed 458 dispatches with a 100% completion rate. Zero failures. Not because the system is perfect, but because I found the failure modes during development and built guards for each one.

The arc from 87% to 100% matters more than the 100% alone. Every automation system has a messy middle. The question is whether you stay in it long enough to get through it, or abandon the project when it does not work on the first try. I stayed, and it took about two weeks of daily debugging. That is a realistic timeline for any serious automation project — similar to what we see when deploying AI systems for clinic operations.

What This Means If You Automate With AI

You do not need to build a multi-model dispatch system. Most businesses do not need that level of infrastructure. But you need to understand three things about the AI you rely on.

Know what your AI actually costs. Not your subscription price — the compute cost underneath it. If you are on a $20 or $200 plan and using AI heavily, you are almost certainly consuming more in compute than you pay. That is fine today. It will not be fine forever.

Have a plan for when pricing changes. Usage-based billing is coming to every major AI platform. When your flat-rate $200 becomes metered at API rates, your bill could increase 5–10x for the same usage. The businesses that plan for this will switch models, optimize their pipelines, or negotiate enterprise agreements. The businesses that do not will be surprised by their next invoice.

Do not build on a single vendor's subsidized pricing. The cheapest path today — one model, one vendor, unlimited usage on a flat subscription — is the most fragile path tomorrow. Design your systems so you can swap models when pricing shifts. Test whether cheaper alternatives handle your specific tasks — for many workloads, they will. Build cost tracking into your automation from day one, not after the bill arrives. Know what each AI call costs, how many calls each workflow makes, and what your monthly compute burn looks like at API rates instead of subscription rates. The difference between those two numbers is your exposure. The ROI calculations we do for clients include AI compute as a real cost line, not a rounding error — because we have seen what happens when you treat infrastructure costs as someone else's problem.

Everyone can see that AI pricing will change. Most people are waiting. I built the alternative while the subsidized pricing was still paying for my development time.

That is not vision. It is arithmetic.

Mind Momentum builds AI automation systems where cost transparency is part of the design, not an afterthought. If you are evaluating how AI fits into your operations and want to understand the real numbers, get in touch.