How do I reduce my Cursor API bill?

Cursor Pro's $20/mo covers 500 fast requests — past that you pay OpenAI / Anthropic per-call rates directly, which is why heavy users end up at $50-$200/mo. The fix is to point Cursor's Custom API at CodeRouter and set model to 'auto'. CodeRouter detects what phase of coding the request is (planning, implementation, debugging, test generation, docs) and routes to the cheapest capable model per phase — Opus only for planning, DeepSeek V3 for test generation, Haiku for docstrings. Same Cursor IDE, same keyboard shortcuts, 70-90% lower monthly bill. Setup takes 2 minutes — just change base_url to https://www.coderouter.io/api/v1 and paste your cr_ API key.

What is the cheapest API for Claude Code / Aider / Copilot?

There isn't a single 'cheapest API' — the cheapest model depends on what the coding agent is doing. For planning and architecture, you still want Claude Opus 4.7 or Sonnet 4.6. For implementation, DeepSeek V3.2 ($0.28/$0.42 per 1M) and Qwen 3 Coder are 30-50x cheaper than Opus with near-equivalent code quality. For test generation, DeepSeek V3 or Haiku 4.5 is 15-50x cheaper. For docstrings and simple formatting, Haiku 4.5 ($1/$5) or Gemini 2.5 Flash ($0.30/M output) is 15-250x cheaper. CodeRouter is the gateway that picks per request automatically — aim a single base_url at https://www.coderouter.io/api/v1 from Claude Code, Aider, Copilot (via LiteLLM), Cursor, Windsurf, or any OpenAI-compatible agent.

Does DeepSeek V3 work as well as Claude Sonnet for coding?

For implementation and test generation phases — yes, DeepSeek V3.2 matches Claude Sonnet 4.6 on HumanEval, MBPP, and LiveCodeBench within 1-3 points. For multi-file refactoring and architecture planning, Sonnet still has an edge on long-context reasoning. DeepSeek V3 costs $0.28 input / $0.42 output per 1M tokens, vs Sonnet's $3/$15 — roughly 30-50x cheaper. The right answer for most coding agents is not 'pick one forever' but 'use DeepSeek V3 for the implement/test phases and Sonnet for the plan/refactor phases.' That's phase-aware routing in practice — CodeRouter decides per request in ~10ms.

What is phase-aware LLM routing?

Phase-aware LLM routing classifies each coding-agent request by what phase of software work it represents — planning, implementation, debugging, testing, refactoring, or documentation — and routes it to the cheapest model that can handle that specific phase. A 'write unit tests for this function' request goes to DeepSeek V3 ($0.42/M). A 'refactor this multi-file feature and plan the migration' request goes to Claude Opus 4.7 ($75/M). This is different from picking one model for everything, and different from OpenRouter-style model-selection (which still requires you to choose manually). CodeRouter's classifier runs in ~10ms on the server, so the agent never notices the extra hop.

CodeRouter vs OpenRouter — which saves more money on coding?

OpenRouter is a model marketplace — it gives you access to 300+ models behind one API key, but you still pick which model to send each request to. Most Cursor / Aider / Claude Code users default to the premium model (Opus, GPT-5) for everything and end up paying full price. CodeRouter is a phase-aware router — set model to 'auto' and we pick the cheapest capable model per request based on the coding phase. CodeRouter also adds things OpenRouter doesn't: coding-specific capability scores per model (implementation, debug, test, refactor), per-end-user attribution for SaaS agent builders, and built-in quota + top-up billing. For pure coding workloads, typical CodeRouter savings are 70-90% vs picking one model on OpenRouter.

Will CodeRouter break my Cursor / Aider / Claude Code agent?

No. CodeRouter exposes a standard OpenAI-compatible chat completions endpoint (POST /api/v1/chat/completions) with the same request and response format your agent already uses — including streaming, tool use, and function calling. We implement the same JSON schema and stream format, so Cursor, Aider, Claude Code, Cline, Continue.dev, Windsurf, OpenClaw, and any LiteLLM-wrapped client work unmodified. If a routed model fails, the fallback chain tries up to 2 alternates automatically (on 429, 500-504, timeouts, missing keys). You can also pin an explicit model instead of 'auto' any time.

How do I set up CodeRouter with Cursor in 2 minutes?

1) Sign up free at https://www.coderouter.io/login and copy your API key (starts with cr_). 2) In Cursor, open Settings -> Models -> OpenAI API Key, and under 'Override OpenAI Base URL' paste https://www.coderouter.io/api/v1. Paste your cr_ key in the API Key field. 3) Add 'auto' to the Custom Models list and select it as your active model. That's it — phase-aware routing is live. Aider users set OPENAI_API_BASE and OPENAI_API_KEY env vars to the same values. Claude Code users set ANTHROPIC_BASE_URL to https://www.coderouter.io/api/v1 and ANTHROPIC_API_KEY to the cr_ key. Full guide at https://www.coderouter.io/setup.

Claude Code Cheap API Router Setup (2026 Guide)

TL;DR — Claude Code always calls Anthropic's API with whatever model you've configured. Configure it with model: auto pointed at CodeRouter, and a phase-aware router decides per-call which model runs — keeping Opus for genuine planning / hard debug phases, moving the 70% of bread-and-butter calls to Sonnet 4.6 or DeepSeek V3. Same Claude Code UX, 60–80% lower bill.

Why Claude Code is expensive

Claude Code is purpose-built for long agentic coding sessions — Plan mode, long tool chains, file-read/write, Bash execution. It's probably the most efficient coding agent at output quality per engineered result.

The problem is cost per token:

Claude Code defaults to Opus 4.7 — $15 input / $75 output per 1M tokens.
Agentic tool use means every step reships the growing context. A 50-step session with 80K accumulated context = ~$4M of context tokens re-processed at Opus rates.
Plan mode (super useful) runs on Opus too. Fine for the planning call. Wasteful for the subsequent implementation calls that follow the plan.

A typical full Claude Code day with 3–5 planning sessions and 100+ implementation steps easily hits $30–80/day direct to Anthropic. That's $700–2400/month for one heavy user.

How phase-aware routing reduces this

Claude Code's workflow actually maps beautifully onto phase detection:

| Claude Code mode | Phase detected | Router's pick | |---|---|---| | Plan mode (<plan_mode> tag) | plan (confidence 0.95) | Claude Opus 4.7 ← stays | | Tool call: Read followed by user asking a question | plan (multiple file reads = review) | GPT-5.2 / Sonnet 4.6 | | Tool call: Bash → output contains Error: / traceback | debug | Sonnet 4.6 / DeepSeek R1 | | Tool call: Edit / Write with implementation intent | implement | Sonnet 4.6 / DeepSeek V3 | | Tool call: Write to a *_test.py or test_*.ts file | test | DeepSeek V3 / Kimi K2.5 | | Tool call: Write to a .md file | document | Haiku 4.5 / Gemini Flash |

The router uses Claude Code's own system prompt to fingerprint it with 95% confidence, then uses Plan mode tag + tool-result inspection to pick the right phase. No config needed beyond model: auto.

Setup — Claude Code proxy configuration

Two ways to configure Claude Code: env vars (quickest, single-session) or ~/.claude/settings.json (persistent across launches). Both shown below.

Critical: the env-var name trap

Before you copy any setup snippet — the env var that authenticates Claude Code against a third-party gateway is ANTHROPIC_AUTH_TOKEN, not ANTHROPIC_API_KEY.

If you set ANTHROPIC_API_KEY=cr_xxx and run Claude Code:

The env var is silently ignored
Claude Code falls back to your existing OAuth session
The header at the top shows Claude Max instead of API Usage Billing
All requests still go to api.anthropic.com directly — bypassing your gateway entirely

Why: ANTHROPIC_API_KEY is hard-wired to authenticate against api.anthropic.com. When ANTHROPIC_BASE_URL points elsewhere, only ANTHROPIC_AUTH_TOKEN (Bearer token mode) is honored.

Sanity check: after launching, look at the badge in Claude Code's header. If it doesn't say API Usage Billing, your env var didn't take effect. Run claude logout and try again.

Env vars (one-shot)

# Install Claude Code if you haven't already
$ npm install -g @anthropic-ai/claude-code@latest

# Point Claude Code at CodeRouter — note AUTH_TOKEN, not API_KEY
$ export ANTHROPIC_BASE_URL=https://www.coderouter.io/api/v1
$ export ANTHROPIC_AUTH_TOKEN=cr_your_coderouter_key

# Force every model tier through smart routing. Without these, Claude Code's
# /model picker may try a hardcoded default that doesn't exist on our side.
$ export ANTHROPIC_MODEL=auto
$ export ANTHROPIC_DEFAULT_OPUS_MODEL=auto
$ export ANTHROPIC_DEFAULT_SONNET_MODEL=auto
$ export ANTHROPIC_DEFAULT_HAIKU_MODEL=auto

# v2.1.x had experimental beta features that some third-party gateways
# don't yet implement. Disabling them is harmless and avoids 400 errors.
$ export CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1

$ claude

Persistent — `~/.claude/settings.json`

The persistent config lives at ~/.claude/settings.json (note: .claude/ in your home dir, not ~/.config/claude-code/). Format:

{
  "env": {
    "ANTHROPIC_BASE_URL": "https://www.coderouter.io/api/v1",
    "ANTHROPIC_AUTH_TOKEN": "cr_your_coderouter_key",
    "ANTHROPIC_MODEL": "auto",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "auto",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "auto",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "auto",
    "CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS": "1"
  }
}

When you next run claude, it picks up this env block automatically — no shell-level export needed.

When Claude Code connects, it speaks Anthropic's native /v1/messages protocol. CodeRouter translates that into whichever provider phase detection chose (Anthropic for Opus / Sonnet / Haiku calls, OpenAI / DeepSeek / Google for non-Anthropic picks) and translates the response back so Claude Code sees a consistent Anthropic-shaped reply regardless of which upstream actually served it.

Troubleshooting — the four errors you'll likely hit

These are the actual production failures we've seen from real testers in the past 48 hours. If you hit one, the fix is below.

"There's an issue with the selected model (auto). It may not exist..."

This is Claude Code's catch-all error for "model validation failed". Three independent causes can trigger it:

Wrong env var name — see the AUTH_TOKEN section above. Switch from ANTHROPIC_API_KEY to ANTHROPIC_AUTH_TOKEN and confirm the header shows "API Usage Billing".
Experimental betas (v2.1.x only) — Claude Code v2.1.x sends an anthropic-beta header for features that some gateways don't implement. Add:
```
export CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1
```
Or in settings.json, include "CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS": "1" in the env block.
Older Claude Code without the auto model in /model picker — Claude Code v2.1.80 and earlier don't show a "Custom model" option in the /model slash command. Either pass it on the CLI:
```
claude --model auto
```
Or upgrade: npm install -g @anthropic-ai/claude-code@latest.

"Free plan requires your own provider API keys" (HTTP 403)

CodeRouter's Free trial requires Bring-Your-Own-Key (you supply Anthropic / OpenAI keys, we just route). Three options:

Add your own provider key at /dashboard/models — keep using Free with smart routing
Upgrade to a paid plan — Starter from $19/mo, Pro $99/mo gets you 30M tokens
(Internal teams only) Get whitelisted by your org's CodeRouter admin

Claude Code header still shows "Claude Max"

Existing OAuth session is winning over your env vars. Logout first:

claude logout
# then re-run with your env vars set

"Anthropic API error 400: Unable to download the file"

Caused by a base64-encoded image being forwarded to Anthropic as {type: "url"} instead of {type: "base64"}. Already fixed in CodeRouter — if you still hit this, the image MIME type may be unsupported (Anthropic only accepts image/png, image/jpeg, image/gif, image/webp).

Verifying routing is active

After your first Claude Code session:

Open the CodeRouter dashboard → Analytics.
Look at the Phase Distribution chart. You should see a mix — if 100% of requests show as one phase (e.g. all "implement"), detection isn't picking up your agent's patterns and you should check your Claude Code version.
Check the Models Used chart. A healthy distribution: ~10% Opus 4.7, ~40% Sonnet 4.6, ~25% DeepSeek V3, ~15% Haiku 4.5, ~10% others. All-Opus means the phase detector is treating everything as plan — bug, report to us.
Weekly Savings card shows what Opus-for-everything would have cost.

Real-world cost example

Claude Code daily session logs from an internal test (~200 tool calls, 4 hours of use):

| Setup | Cost for that day | |---|---| | Claude Code → Anthropic direct (Opus 4.7 default) | $42.18 | | Claude Code → Anthropic direct (Sonnet 4.6 model flag) | $8.94 | | Claude Code → CodeRouter | $3.81 (blended across phases) |

That's 91% vs. Opus-direct, or 57% vs. the already-optimized Sonnet-direct config. Plan mode calls still hit Opus (as they should), but the 160+ implementation/test/docstring/refactor calls that followed got distributed across cheaper models.

FAQ

Will Plan mode still work? Yes, and that's the whole point — CodeRouter recognizes Plan mode via the <plan_mode> system prompt marker and keeps routing it to Opus. What changes is the post-plan implementation calls that Claude Code makes by the dozen.

What about Claude Code's native MCP tool ecosystem? Unaffected. MCP servers are local; only the LLM calls route through CodeRouter.

Does routing to DeepSeek/Gemini break Claude Code's tool-call format? No — CodeRouter translates Anthropic's tool_use blocks into OpenAI's tool_calls format for non-Anthropic providers and back. Claude Code sees a consistent Anthropic-shaped response regardless of which upstream served it.

What about Claude Code on Bedrock / Vertex? If you're on Bedrock, keep your existing setup — no CodeRouter needed since Bedrock pricing differs. We integrate with native Anthropic API, not with cloud-provider resellers.

Full integration docs for every agent — Cursor, Aider, Cline, Continue, Windsurf, OpenClaw, Copilot via LiteLLM, raw SDKs
How to Cut Your Cursor Bill by 70–90%
Aider Cost Optimization 2026
Phase-Aware LLM Routing Explained
CodeRouter vs OpenRouter

Claude Code Cheap API Router Setup (2026 Guide)

Why Claude Code is expensive

How phase-aware routing reduces this

Setup — Claude Code proxy configuration

Critical: the env-var name trap

Env vars (one-shot)

Persistent — `~/.claude/settings.json`

Troubleshooting — the four errors you'll likely hit

"There's an issue with the selected model (auto). It may not exist..."

"Free plan requires your own provider API keys" (HTTP 403)

Claude Code header still shows "Claude Max"

"Anthropic API error 400: Unable to download the file"

Verifying routing is active

Real-world cost example

FAQ

Related

Ready to Reduce Your AI API Costs?

Claude Code Cheap API Router Setup (2026 Guide)

Why Claude Code is expensive

How phase-aware routing reduces this

Setup — Claude Code proxy configuration

Critical: the env-var name trap

Env vars (one-shot)

Persistent — ~/.claude/settings.json

Troubleshooting — the four errors you'll likely hit

"There's an issue with the selected model (auto). It may not exist..."

"Free plan requires your own provider API keys" (HTTP 403)

Claude Code header still shows "Claude Max"

"Anthropic API error 400: Unable to download the file"

Verifying routing is active

Real-world cost example

FAQ

Related

Ready to Reduce Your AI API Costs?

Related Articles

We Just Cut Our Own AI Coding Bill 83% in 12 Hours — Here's the Data

我们把自家 AI 编码账单砍掉 83% —— 12 小时复盘(2026 真实数据)

中转站 vs 任务路由:为什么 80% 的 AI 编码账单是浪费(2026)

Get weekly AI cost optimization tips

Persistent — `~/.claude/settings.json`