Per-Token Pricing Changes the Documentation Format Conversation

When every token is billed at API rates, the format of your API documentation stops being an efficiency question and becomes a cost question.

What changed

Anthropic updated its enterprise plan. The seat fee now covers platform access only. Every token used across Claude, Claude Code, and Cowork is billed separately at standard API rates. From Anthropic's enterprise help center: "Usage isn't included in the seat fee. Every token your team uses, in chat, Claude Code, or Cowork, is billed at standard API rates on top of your seat cost."

Organizations on older seat-based plans with fixed usage allowances keep their current terms until their next contract renewal. After that, they migrate to usage-based billing.

Anthropic isn't alone in this. OpenAI shifted Codex from flat-message pricing to token metering on April 2. GitHub tightened Copilot limits on April 10. The direction is consistent across the industry: flat-fee subscriptions for AI coding tools are giving way to usage-based billing.

Why this matters for documentation

Under flat-fee pricing, a verbose API spec cost the same as a compact one. The tokens were consumed either way, but nobody saw them on a bill. The cost was real, it consumed context window space and competed for the model's attention, but it was invisible to Accounts Payable.

Under per-token pricing, those tokens have a price. Every time a developer uses an AI coding tool with your API documentation, the format of that documentation determines how many tokens get billed. A verbose format doesn't just waste context. It costs more money, and now there's a line item to prove it.

What the data shows

In the research for Tokens Not Jokin', we documented the same APIs in four formats and measured the token cost of each. The differences were not small.

YAML described the same API information using roughly 80% fewer tokens than OpenAPI 3.0. For a 10-endpoint API, that was 2,007 tokens versus 7,534 tokens. Same endpoints, same parameters, same constraints. The only difference was how the information was structured.

Under flat-fee pricing, that's an interesting efficiency finding. Under per-token pricing, it's arithmetic. At $3 per million input tokens and 100,000 requests per month, the difference between YAML and OpenAPI is roughly $8,000 per year, per API. That cost is paid by your customers, the developers whose AI tools read your docs when they generate code.

The compounding effect

Developers don't use one API. A typical SaaS application might reference documentation for authentication, a database, payment processing, email, storage, and an internal API. That's six sets of documentation competing for context space in every coding session.

With YAML, six APIs consume about 8,000 tokens, roughly 6% of a 128K context window. With OpenAPI, the same six APIs consume about 30,000 tokens, nearly a quarter of the window. Under flat-fee pricing, that difference was invisible. Under per-token pricing, it shows up every month.

And the API documentation format isn't just about cost. The research showed that format affects more than token count. On smaller models, a 3.8B parameter model achieved 100% pass rates with YAML and 0% with OpenAPI on a complex API. The verbose format didn't just cost more. It broke the model entirely.

What documentation teams should consider

If your API documentation is in OpenAPI format and your customers use AI coding tools (and they do, 72% of developers reported using AI tools in 2024), the billing shift means your format choice now has a direct cost implication for the people using your API.

They probably won't trace the expense back to your documentation. Token costs are aggregated across everything the model processes in a session. But the cost is real, and it compounds across every request, every developer, every day. And now, it's visible to your customers in a very important way: their bottom line.

A YAML version of your API documentation uses fewer tokens, produces equivalent or better code generation results, and now costs your customers less money per interaction.

The research behind these numbers, including the full methodology, pass rates across four models, and the format comparison data, is in Tokens Not Jokin'.

Get the full research:

Tokens Not Jokin' on Leanpub →

See what your docs cost with the free calculator →