Finance teams that have never budgeted for AI spend have a consistent problem: the first few months look cheap, and then a pipeline scales, usage grows faster than expected, and Q3 comes in 40% over plan. Building a defensible 12-month AI cost projection isn’t complicated, but it requires thinking about usage in a way that’s different from SaaS subscriptions or headcount.
Why AI Costs Are Different from Other Software Costs
SaaS tools have fixed or predictable pricing: $X per seat per month, $Y per GB of storage, $Z per feature tier. You negotiate a contract, set up a PO, and you’re done. AI API costs are fundamentally consumption-based and correlated with your product’s growth — which means they scale non-linearly as usage increases.
There are three dynamics that make AI costs hard to budget without a framework:
Usage growth compounds. If you build a feature that calls GPT-4o once per user per day, and your user base grows 15% month-over-month, your token spend grows 15% month-over-month. That seems obvious, but teams frequently budget month 1 volume and extend it flat across the year.
Prompt length creeps. Engineers iterate on prompts. System prompts grow as you add edge case handling. Context windows fill up as you add retrieval. A prompt that was 800 tokens in January might be 1,400 tokens by September, simply from product iteration. If you don’t account for prompt bloat, your cost projections will be systematically low.
Model upgrades change the cost curve. When you upgrade from GPT-4o-mini to GPT-4o for a feature, the cost per call increases by approximately 20x for input tokens and 10x for output tokens. Even if you’re confident you won’t upgrade for 12 months, your projection should model the cost if you do — because stakeholders will ask.
The Four Inputs You Need Before You Budget
A reliable 12-month projection requires four numbers for each AI-powered workflow or feature:
- Average tokens per call (input + output combined, broken down separately if using different billing rates)
- Call volume today (calls per day, week, or month)
- Expected volume growth rate (monthly percentage growth based on product roadmap or historical data)
- Target model and provider (determines per-token price)
You can get input 1 by running your actual prompts through the AI Token Counter, which shows exact token counts per model. For inputs 2 and 3, pull from your analytics or engineering team. For input 4, use your current model or the model on your roadmap.
The Projection Model
For each workflow, your monthly cost formula is:
Monthly Cost = (Input tokens per call × Input price per million ÷ 1,000,000
+ Output tokens per call × Output price per million ÷ 1,000,000)
× Monthly calls
And monthly calls in month N are:
Calls(N) = Calls(Month 1) × (1 + growth_rate)^(N-1)
For a 12-month projection, calculate this for each month and sum across all workflows.
Example: A customer support triage feature uses GPT-4o-mini (input: $0.15/M, output: $0.60/M). Average call is 1,200 input tokens and 300 output tokens. Current volume is 2,000 calls/day, growing 8% per month.
Month 1 cost: ((1,200 × 0.15) + (300 × 0.60)) ÷ 1,000,000 × 60,000 = (180 + 180) ÷ 1,000,000 × 60,000 = $21.60/month
Month 12 cost (with 8% monthly growth, call volume ≈ 129,000/day): ≈ $46.60/month
12-month total for this one feature: approximately $400-420.
Scale this across five such features at different growth rates and model tiers, and your total AI budget takes shape.
Model Cost Reference Table (Mid-2026)
For projection purposes, here are the input/output token prices for the most commonly used models as of mid-2026:
| Model | Input ($/M tokens) | Output ($/M tokens) |
|---|---|---|
| GPT-4o | $2.50 | $10.00 |
| GPT-4o-mini | $0.15 | $0.60 |
| GPT-4o (Batch) | $1.25 | $5.00 |
| GPT-4o-mini (Batch) | $0.075 | $0.30 |
| Claude 3.5 Sonnet | $3.00 | $15.00 |
| Claude 3.5 Haiku | $0.80 | $4.00 |
| Claude Sonnet (Batch) | $1.50 | $7.50 |
| Claude Haiku (Batch) | $0.40 | $2.00 |
| Gemini 1.5 Flash | $0.075 | $0.30 |
| Gemini 1.5 Pro | $1.25 | $5.00 |
Prices change — always verify against provider pricing pages before finalizing a budget. These figures are a rough benchmark for directional planning.
Building the Budget Spreadsheet
Structure your projection spreadsheet with one tab per workflow and a summary rollup. Each workflow tab should have:
- Inputs section: tokens/call (input and output separately), current daily calls, monthly growth rate, model selection, current per-token prices
- Monthly projection table: 12 rows, one per month. Columns: call volume, monthly cost, cumulative cost
- Scenario columns: Base case (current growth rate), conservative case (half the growth rate), aggressive case (2x growth rate)
The summary tab rolls up all workflows by month and shows total AI spend per month across the full projection window.
Add three line items that teams routinely forget:
- Prompt inflation buffer: Add 15-20% to your base token estimate to account for prompt growth over 12 months
- Model upgrade scenarios: Show what happens to total cost if you upgrade one tier (e.g., GPT-4o-mini to GPT-4o) on your highest-volume workflow
- Error and retry costs: API calls that fail and retry still consume tokens on the first attempt. Budget 3-5% overhead for retries.
Getting Your Token Baseline Right
The most common error in AI budgeting is using the wrong token count as the baseline. Teams often estimate tokens based on word count, then get surprised when the actual bill is 25-40% higher because they forgot the system prompt, didn’t account for conversation history in multi-turn features, or used a different model’s tokenizer as reference.
Use the AI Token Counter to measure your actual prompt with your actual model’s tokenizer — not an estimate. Paste your complete system prompt plus a representative user message and note the exact input token count. Do the same for 10-15 representative examples to get a realistic average, not just the median case.
That measured baseline, applied to your volume projections, produces forecasts that hold up when your CFO asks how you got the number.
How to Present AI Budget to Finance
Finance teams want three numbers: the base-case annual total, the upside scenario (if growth accelerates), and the efficiency levers available if costs run over. Structure your presentation around these three outputs:
Base case: Current model, current growth rate, projected prompt inflation. This is your “do nothing different” number.
Upside scenario: 1.5-2x growth rate, potential model upgrade on key features. This is the ceiling you need approval to spend up to without coming back for re-approval.
Efficiency levers: Moving X% of volume to batch API (50% savings on that volume), switching Y workflow from GPT-4o to GPT-4o-mini (5-20x savings per call), or self-hosting Z workflow with a small model after month 6 (potential 60-70% cost reduction after break-even). Show these as scenarios, not commitments.
This framing makes the conversation productive: finance understands the range, knows what they’re approving, and knows what levers exist if costs run hot.
Build Your Projection in 30 Seconds
Start with your token baseline. Paste your actual prompts into the AI Token Counter, get the exact token counts per model, then apply your volume and growth assumptions. The tool outputs per-model cost estimates that slot directly into your projection spreadsheet — no manual lookups against pricing tables required.
Frequently asked questions
How should I handle AI models that charge per request, not per token? Some providers and wrapper services charge flat per-request fees rather than per-token. For projection purposes, treat the per-request cost as an “effective token rate” by dividing the request cost by the average tokens consumed. This lets you model growth using the same framework. If the per-request model has caps (e.g., max 2,000 tokens per request), model your volume at the cap, not the average, to avoid underestimating.
How do I account for context window costs in multi-turn conversations? In a chat feature, each turn in the conversation adds to the context window, so token cost per call increases as conversations lengthen. To model this, calculate average conversation length in turns, estimate average tokens per turn (including history), and use a weighted average token count. A 10-turn conversation where each turn adds 200 tokens means the final turn costs roughly 2,000 tokens of context — 10x the first turn.
What growth rate should I assume if we’re pre-launch? Pre-launch, use your analogous product’s early growth rate if you have one, or build a bottom-up model from your user acquisition forecast: projected daily active users × estimated AI calls per active user per day. If you have no comparable data, use a conservative monthly growth rate of 20% for months 1-3 and 10% for months 4-12. Better to over-budget and return headroom than to run out of AI budget mid-year.
Should I budget for model price changes over 12 months? AI model prices have generally trended down over time — GPT-4o-mini’s price dropped significantly between launch and early 2026. However, budgeting on the assumption that prices will fall is risky. Use current pricing for your base case and show a “price decrease scenario” separately. If prices drop, you’ll have budget headroom; if they don’t, you’re covered.
How do I track actual AI spend against my projections month to month? OpenAI’s API dashboard provides usage reports by model and date. Anthropic has similar reporting under “Usage” in the console. Export these monthly, map them to your projected totals by workflow (you’ll need to tag requests with workflow identifiers in your code), and flag any line that exceeds 20% of projected spend — that’s your early warning for a runaway pipeline.
Related reading
- AI Token Counter — measure token usage and model your monthly costs
- AI Batch API Discount Guide — cut projected costs by 50% on async workloads
- AI ROI Formula for Executives