Token pricing varies dramatically across providers and models. Choosing the right model for your workload can mean the difference between spending $10/month and $1,000/month. Here's a practical comparison of current pricing and how to pick the most cost-effective option.
Current Pricing Overview (Per Million Tokens)
Prices are listed as input / output per million tokens:
OpenAI
- GPT-4o: $2.50 input / $10.00 output
- GPT-4o mini: $0.15 input / $0.60 output
- GPT-4.1: $2.00 input / $8.00 output
- GPT-4.1 mini: $0.40 input / $1.60 output
- GPT-4.1 nano: $0.10 input / $0.40 output
- o3: $2.00 input / $8.00 output
- o4-mini: $1.10 input / $4.40 output
Anthropic
- Claude 3.5 Sonnet: $3.00 input / $15.00 output
- Claude 3.5 Haiku: $0.80 input / $4.00 output
- Claude 3 Opus: $15.00 input / $75.00 output
- Claude 4 Sonnet: $3.00 input / $15.00 output
- Gemini 2.5 Pro: $1.25 input / $10.00 output (under 200K tokens)
- Gemini 2.5 Flash: $0.15 input / $0.60 output
- Gemini 2.0 Flash: $0.10 input / $0.40 output
- Gemini 1.5 Pro: $1.25 input / $5.00 output (under 128K tokens)
Real-World Cost Scenarios
Abstract per-million pricing is hard to reason about. Here's what common workloads actually cost:
Scenario 1: Customer Support Chatbot
Average conversation: 500 input tokens, 300 output tokens. 10,000 conversations/day.
- GPT-4o: $0.0125 + $0.03 = $0.0425/day → $1.28/day
- GPT-4o mini: $0.00075 + $0.0018 = $0.00255/day → $0.08/day
- Claude 3.5 Haiku: $0.004 + $0.012 = $0.016/day → $0.48/day
- Gemini 2.0 Flash: $0.0005 + $0.0012 = $0.0017/day → $0.05/day
Scenario 2: Document Summarization
Average document: 10,000 input tokens, 500 output tokens. 1,000 documents/day.
- GPT-4o: $0.025 + $0.005 = $0.03/doc → $30/day
- Claude 3.5 Sonnet: $0.03 + $0.0075 = $0.0375/doc → $37.50/day
- Gemini 2.5 Flash: $0.0015 + $0.0003 = $0.0018/doc → $1.80/day
When to Use Which Model
Use Mini/Flash Models When:
- Tasks are straightforward (classification, extraction, simple Q&A)
- Volume is high and cost sensitivity is critical
- Latency matters more than maximum quality
- You're building prototypes or development environments
Use Flagship Models When:
- Tasks require complex reasoning or nuanced understanding
- Output quality directly impacts revenue or user experience
- You're processing code, legal documents, or technical content
- The cost per request is justified by the value of each response
Cost Optimization Tips
- Route by complexity: Use a cheap model to classify requests, then route complex ones to a flagship model. This can cut costs by 60–80%.
- Cache repeated prompts: Both OpenAI and Anthropic offer prompt caching that reduces input costs by 50–90% for repeated prefixes.
- Batch when possible: OpenAI's Batch API offers 50% off for non-time-sensitive requests.
- Optimize prompts: Reducing prompt length by 40% saves 40% on input costs for every single request.
Pricing changes frequently. Always check the provider's official pricing page before making architecture decisions. The numbers above reflect pricing as of early 2025.