Most teams building AI features get surprised by their first invoice. They tested with a few hundred requests, the numbers looked fine, then they hit 50,000 requests in month two and the bill tripled their projections. The formula is simple — the mistake is almost always not measuring actual token counts before estimating costs.

analyst with laptop showing cost calculations, office desk with financial reports — Photo by Unsplash photographer on Unsplash

The Core Formula for AI Cost Per Request

Every AI API call has two cost components: input tokens (everything you send to the model) and output tokens (what the model returns). The formula for cost per request is:

Cost per request = (Input tokens × Input price per token) + (Output tokens × Output price per token)

Since pricing is quoted per million tokens, you divide by 1,000,000:

Cost per request = (Input tokens / 1,000,000 × Input MTok price) + (Output tokens / 1,000,000 × Output MTok price)

Scaling to cost per 1,000 requests simply multiplies by 1,000:

Cost per 1K requests = [(Avg input tokens × Input MTok price) + (Avg output tokens × Output MTok price)] / 1,000

This is the number that goes into your product cost model. Run this calculation before you write the integration code, not after your first production invoice.

Worked Example: Customer Support Summarization

Imagine you’re building a feature that summarizes customer support tickets and suggests a resolution category. A typical prompt might look like:

System prompt: 500 tokens (instructions, category list, examples)
Customer message: 200 tokens (average ticket length)
Total input: 700 tokens

The model output — a summary plus a category — is typically around 150 tokens.

Running this on GPT-5 ($2.50 input / $15.00 output per MTok):

Input cost per request: 700 / 1,000,000 × $2.50 = $0.00175
Output cost per request: 150 / 1,000,000 × $15.00 = $0.00225
Total cost per request: $0.004
Cost per 1,000 requests: $4.00

At 10,000 tickets per month, that’s $40/month. Reasonable for a serious feature.

Now run the same calculation on GPT-4.1 Mini ($0.40 input / $1.60 output per MTok):

Input: 700 / 1,000,000 × $0.40 = $0.00028
Output: 150 / 1,000,000 × $1.60 = $0.00024
Total: $0.00052 per request
Cost per 1K requests: $0.52

The cheaper model handles the same task at roughly 1/8th the cost. For a classification task with well-structured inputs, the quality gap is often minimal. That’s the calculation worth doing before you default to a frontier model.

Use the free AI Token Counter to paste your actual system prompt and a representative message, get the exact token count, and run this formula with real numbers instead of guesses.

person analyzing dashboard metrics on screen, office with large monitor setup, API usage analytics and cost breakdown charts — Photo by Unsplash photographer on Unsplash

The Input-to-Output Ratio Changes Everything

The single biggest source of cost estimation error is misunderstanding the input-to-output ratio for your specific use case. Since output tokens cost 4–10x more than input tokens, a generation-heavy task is fundamentally different from an extraction task.

Extraction tasks (classify, tag, extract structured data): typically 85–95% input, 5–15% output. Input price dominates. Choose the cheapest model that achieves acceptable accuracy.

Summarization tasks (condense long documents): typically 80–90% input, 10–20% output. Still input-dominant, but output cost becomes meaningful when your model is verbose.

Generation tasks (write content, draft responses, create copy): typically 30–50% input, 50–70% output. Output price becomes the dominant factor. A model with cheap input but expensive output can surprise you here.

Conversation tasks (multi-turn chat): the ratio shifts each turn as conversation history grows. By turn 5, a chat session that started with a 200-token message might have 2,000 tokens of input just from accumulated history. Model costs can increase 3–5x over a long session compared to a fresh request.

Measuring the actual ratio for your task is worth doing once. Run 50–100 representative requests, log input and output token counts, and calculate your real ratio. Everything downstream of that — model selection, pricing estimates, budget forecasts — becomes more accurate.

Building a Monthly Cost Projection

Once you have cost per 1,000 requests, the monthly projection formula is:

Monthly cost = (Daily request volume × 30 × Cost per request)

Or equivalently:

Monthly cost = (Monthly requests / 1,000) × Cost per 1K requests

For a realistic annual budget, add three multipliers that experienced teams consistently find necessary:

Growth buffer (+25%): Usage grows as more users discover the feature. Plan for it.
Infrastructure overhead (+30%): Orchestration, monitoring, error handling, rate limiting logic — these add real API calls that your initial estimate doesn’t include.
Experimentation budget (+15%): You’ll test new models, optimize prompts, run A/B tests. Budget this as a line item rather than letting it appear as an unplanned overage.

The realistic annual budget is roughly 1.7× your base calculation. Teams that skip these multipliers consistently underestimate actual spend.

A rough benchmark from NMM student projects: a B2B SaaS feature handling 50,000 requests per month with 1,500 average input tokens and 400 average output tokens costs approximately $200–250/month on GPT-4.1 Mini, versus $1,800–2,100/month on GPT-5. Same feature, same quality for extraction work — 8–9x cost difference.

Five Factors That Inflate Real-World Costs

The formula gives you a floor, not a ceiling. Here’s what adds to the theoretical number:

1. System prompt size. A 2,000-token system prompt gets charged on every single request. On 100,000 monthly requests, that’s 200 million tokens of input just from your system prompt. Prompt caching makes this economical — cached input from OpenAI costs $0.25/MTok versus $2.50/MTok standard, a 90% reduction. If your system prompt is large and static, caching it is the highest-leverage cost optimization available.

2. Reasoning tokens. If you use a reasoning model like o3, o4-mini, or DeepSeek R1, the model generates internal “thinking” tokens that count toward output cost. These are invisible in the response but very visible on your bill. A reasoning call that returns 500 tokens of visible output might have generated 3,000 tokens of internal reasoning charged at output rates.

3. Retry logic. A 5% error rate with automatic retries means roughly 5% more API calls than your base estimate. A 15% error rate on a cheaper model might cost more in retries than the savings from lower per-token rates.

4. Context accumulation in conversations. Multi-turn applications where you include the full conversation history grow in cost with every turn. A conversation at turn 10 sends 9 turns of history as input on that call. Design truncation or summarization logic to cap context size.

5. Streaming overhead. Some implementations stream token-by-token for real-time UX. Streaming doesn’t change your token count, but if your implementation sends partial response confirmations or keeps connections open, check that your proxy layer isn’t adding overhead.

software developer at coding workstation, home office with multiple monitors, code editor with API integration code visible — Photo by Unsplash photographer on Unsplash

Count Your Tokens in 30 Seconds

The most common error in AI cost planning is estimating token counts instead of measuring them. “It’s probably about 500 tokens” is a rough guess that can be off by 3–4x depending on prompt structure, language, whitespace, and special characters.

The free AI Token Counter lets you paste your exact system prompt and a representative user message, then shows you the precise token count, word and character equivalents, and a side-by-side cost estimate across GPT-5, GPT-4.1 Mini, Claude Sonnet 4, Gemini 2.5 Flash, and others. Run it on your 10th-percentile, median, and 90th-percentile request sizes to understand your cost distribution — not just your average case.

Once you have real token counts, the formula above gives you a defensible cost projection you can actually take to a budget meeting or product roadmap discussion.

Frequently asked questions

How do I get my average token counts if I haven’t built the feature yet? Manually assemble 10–20 representative prompts the way your application would send them — system prompt plus realistic user inputs. Run them through a token counter to get counts. This takes 20–30 minutes and gives you a much better estimate than guessing. For output tokens, ask the model to complete a handful of sample requests and log what comes back.

Does the model temperature setting affect my token costs? No. Temperature controls randomness in the output but doesn’t change token counts. A higher temperature might produce slightly longer or shorter responses as a side effect of different word choices, but the effect is noise-level small compared to prompt design decisions.

Is batch processing always cheaper? Yes, if your use case tolerates latency. OpenAI’s Batch API processes requests asynchronously (results within 24 hours) at 50% off standard pricing. Anthropic offers a similar batch discount. For any non-real-time task — overnight report generation, background enrichment, scheduled summaries — batch processing halves your effective per-token cost.

How do I log token usage per request in production? Every major provider returns token usage in the API response. OpenAI returns usage.prompt_tokens and usage.completion_tokens in every response object. Log these to your analytics store (Datadog, Mixpanel, your own database) and you’ll have real cost attribution per feature, user, and request type within a week of deployment.

What’s a reasonable cost target per AI-assisted action for a B2B SaaS product? A rough benchmark from NMM experience: most B2B SaaS teams price their product so AI costs represent under 10–15% of revenue per user. If your plan charges $50/user/month, keeping AI costs under $5–7.50/user/month is a healthy target. That translates to roughly 1,000–3,000 AI actions per user per month at $0.002–0.005 per action, depending on model tier.

How to Calculate AI Cost Per 1,000 Requests (2026 Guide)

The Core Formula for AI Cost Per Request

Worked Example: Customer Support Summarization

The Input-to-Output Ratio Changes Everything

Building a Monthly Cost Projection

Five Factors That Inflate Real-World Costs

Count Your Tokens in 30 Seconds

Frequently asked questions

Continue learning

AI Batch API Discount Guide: Get 50% Off in 2026

AI Cost Projection: 12-Month Budgeting Framework 2026

AI for Accountants and CFOs: Close to Forecast in 2026

The Core Formula for AI Cost Per Request

Worked Example: Customer Support Summarization

The Input-to-Output Ratio Changes Everything

Building a Monthly Cost Projection

Five Factors That Inflate Real-World Costs

Count Your Tokens in 30 Seconds

Frequently asked questions

Related reading

Continue learning

AI Batch API Discount Guide: Get 50% Off in 2026

AI Cost Projection: 12-Month Budgeting Framework 2026

AI for Accountants and CFOs: Close to Forecast in 2026