AI Token Pricing Compared: GPT vs Claude vs Gemini

Token pricing varies dramatically across providers and models. Choosing the right model for your workload can mean the difference between spending $10/month and $1,000/month. Here's a practical comparison of current pricing and how to pick the most cost-effective option.

Current Pricing Overview (Per Million Tokens)

Prices are listed as input / output per million tokens:

OpenAI

GPT-4o: $2.50 input / $10.00 output
GPT-4o mini: $0.15 input / $0.60 output
GPT-4.1: $2.00 input / $8.00 output
GPT-4.1 mini: $0.40 input / $1.60 output
GPT-4.1 nano: $0.10 input / $0.40 output
o3: $2.00 input / $8.00 output
o4-mini: $1.10 input / $4.40 output

Anthropic

Claude 3.5 Sonnet: $3.00 input / $15.00 output
Claude 3.5 Haiku: $0.80 input / $4.00 output
Claude 3 Opus: $15.00 input / $75.00 output
Claude 4 Sonnet: $3.00 input / $15.00 output

Google

Gemini 2.5 Pro: $1.25 input / $10.00 output (under 200K tokens)
Gemini 2.5 Flash: $0.15 input / $0.60 output
Gemini 2.0 Flash: $0.10 input / $0.40 output
Gemini 1.5 Pro: $1.25 input / $5.00 output (under 128K tokens)

Real-World Cost Scenarios

Abstract per-million pricing is hard to reason about. Here's what common workloads actually cost:

Scenario 1: Customer Support Chatbot

Average conversation: 500 input tokens, 300 output tokens. 10,000 conversations/day.

GPT-4o: $0.0125 + $0.03 = $0.0425/day → $1.28/day
GPT-4o mini: $0.00075 + $0.0018 = $0.00255/day → $0.08/day
Claude 3.5 Haiku: $0.004 + $0.012 = $0.016/day → $0.48/day
Gemini 2.0 Flash: $0.0005 + $0.0012 = $0.0017/day → $0.05/day

Scenario 2: Document Summarization

Average document: 10,000 input tokens, 500 output tokens. 1,000 documents/day.

GPT-4o: $0.025 + $0.005 = $0.03/doc → $30/day
Claude 3.5 Sonnet: $0.03 + $0.0075 = $0.0375/doc → $37.50/day
Gemini 2.5 Flash: $0.0015 + $0.0003 = $0.0018/doc → $1.80/day

When to Use Which Model

Use Mini/Flash Models When:

Tasks are straightforward (classification, extraction, simple Q&A)
Volume is high and cost sensitivity is critical
Latency matters more than maximum quality
You're building prototypes or development environments

Use Flagship Models When:

Tasks require complex reasoning or nuanced understanding
Output quality directly impacts revenue or user experience
You're processing code, legal documents, or technical content
The cost per request is justified by the value of each response

Cost Optimization Tips

Route by complexity: Use a cheap model to classify requests, then route complex ones to a flagship model. This can cut costs by 60–80%.
Cache repeated prompts: Both OpenAI and Anthropic offer prompt caching that reduces input costs by 50–90% for repeated prefixes.
Batch when possible: OpenAI's Batch API offers 50% off for non-time-sensitive requests.
Optimize prompts: Reducing prompt length by 40% saves 40% on input costs for every single request.

Pricing changes frequently. Always check the provider's official pricing page before making architecture decisions. The numbers above reflect pricing as of early 2025.