Cost Evidence: The 208x Difference

Same Model. Same Workspace. Same Scenarios. Radically Different Cost.

We ran the exact same 5 architecture scenarios against the exact same workspace using two competing AI toolchains — both backed by Claude Opus 4.6. The cost difference is not theoretical. It comes from actual billing data.

The Head-to-Head Comparison

	GitHub Copilot Pro+	Roo Code + OpenRouter
AI Model	Claude Opus 4.6	Claude Opus 4.6
Cost per run	$0.48	~$100
Monthly (38 runs)	$39 (fixed)	~$507 (variable)
Pricing model	Fixed subscription	Pay-per-token
Cost at 50 runs	$39	~$667
Cost at 100 runs	$39	~$1,334
Infrastructure	None (SaaS)	Kong Gateway + vector DB

208x

cheaper per run — $0.48 vs ~$100, using the same underlying AI model.

Why the Difference Is So Large

The Architectural Difference: Indexing vs Recalculation

This is the biggest reason for the 208x cost difference.

GitHub Copilot and OpenRouter represent two fundamentally different architectural approaches to handling static context:

	GitHub Copilot	OpenRouter
Architecture	Pre-indexes workspace (vector database + semantic retrieval)	Token-by-token calculation
Static Context Handling	Indexed once, reused across all queries	Recalculated and billed on every request
Cost Model	Fixed subscription (amortizes indexing cost)	Pay-per-token (includes all recalculation)
What You Pay For	User query that searches the index	Every single token for every single request

Copilot's approach:

GitHub maintains a vector database index of your entire workspace — your specs, source code, decision history, standards. When you ask a question, Copilot: 1. Searches the indexed content (semantic retrieval) 2. Pulls back only the most relevant snippets 3. Sends a small, curated context window to Claude Opus (typically under 5K tokens) 4. Charges you once for your user query

The indexing infrastructure cost is amortized across the entire user base through the $39/month subscription.

OpenRouter's approach:

Every time you run an architecture session, OpenRouter: 1. Re-calculates which context is relevant (no persistent index) 2. Sends all relevant context to Claude Opus on every turn 3. Bills you for every token in every direction 4. Charges you again for the full context on turn 2, turn 3, turn 4...

There's no amortization. No indexing. Each request starts from zero and includes all context recalculation costs.

Why This Matters

In our 5-scenario POC, architects ran multi-turn sessions (4-20 turns per scenario). With OpenRouter's per-token approach:

Turn 1: Full workspace context billed
Turn 2: Full workspace context + accumulated conversation billed again
Turn 3-20: Same context re-billed, growing context window, quadratic cost

With Copilot's indexed approach, that entire session is one or two queries against a pre-indexed database.

The Billing Model: Intent-Based vs Token-Based

The architectural difference enables a different billing model: | | Copilot | OpenRouter | |---|:---:|:---:| | User types a prompt | 1 premium request (fixed per query) | Tokens billed | | AI reads a file from the indexed workspace | Included in query cost | Tokens billed | | AI running a terminal command | Included in query cost | Tokens billed | | AI spawning a sub-agent | Included in query cost | Tokens billed | | AI analyzing search results | Included in query cost | Tokens billed | | Context re-transmission per turn | Included (server-side semantic retrieval) | Tokens billed |

A typical 4-prompt architecture session on Copilot (4 queries to the indexed database):

4 prompts x 3 (Claude multiplier) x $0.04 = $0.48

The same session on OpenRouter recalculates context on every turn and bills every token. Context grows quadratically as the session progresses — a 4-turn session can easily generate 50K-100K tokens.

Why Indexing Matters: The Context Cost Explosion

Without workspace indexing (OpenRouter's model), context costs explode as sessions get longer:

With indexed context (Copilot's model), semantic retrieval keeps context bounded regardless of session length:

Turn	Context Size (OpenRouter)	Context Size (Copilot)
Turn 1	~5K tokens	~5K tokens
Turn 5	~50K tokens	~5K tokens
Turn 10	~120K tokens	~5K tokens
Turn 20	~180K tokens	~5K tokens

Copilot uses server-side semantic retrieval — it selects only the most relevant context for each turn, keeping context bounded at roughly 5K tokens regardless of session length.

OpenRouter re-transmits the full conversation history on every turn, meaning you pay for the same tokens over and over — and the cost per turn grows as the session gets longer.

Measured Overhead

Our context window analysis found that Roo Code broadcasts 81 environment metadata blocks consuming 1,885 lines across a typical task — 16.3% of context wasted on metadata the model doesn't need.

Monthly Cost Projection

At the architecture practice's projected workload (38 runs/month — 26 base scenarios + 12 PROMOTE steps):

Runs/Month	Copilot Pro+	OpenRouter	Copilot Advantage
10	$39	~$133	3.4x
20	$39	~$267	6.8x
38	$39	~$507	13x
50	$39	~$667	17x
100	$39	~$1,334	34x

Copilot's cost line is flat regardless of volume. OpenRouter's grows linearly — and that's the optimistic case, ignoring the quadratic context re-transmission within each session.

Budget Predictability

Risk	Copilot	OpenRouter
Monthly budget	$39 (known, fixed)	Variable (depends on usage)
Runaway costs	Impossible	Possible (long sessions, complex scenarios)
Budget approval	Single line item	Requires usage monitoring and alerts
Cost per new architect	+$39/month	+$133-667/month (depends on workload)
Infrastructure costs	$0	Kong Gateway + Qdrant + monitoring (unquantified)

The cost advantage is not close. At any realistic usage volume, Copilot Pro+ is between 3x and 34x cheaper than per-token alternatives — and that's before accounting for infrastructure overhead, monitoring, and the risk of runaway costs on complex sessions.

What did the AI actually produce?

Output Analysis: What Was Produced in 5 Scenarios