Counting tokens before you send a request to an LLM API is one of the most practical things you can do. It prevents context window errors, lets you estimate costs upfront, and helps you decide when to truncate or chunk your input. Here's how to do it in Python, JavaScript, and Go.
Python: Using tiktoken (Exact Count)
OpenAI's tiktoken library is the gold standard for counting tokens for GPT models. It uses the exact same tokenizer the API uses, so the count is precise.
import tiktoken
def count_tokens(text: str, model: str = "gpt-4o") -> int:
"""Count tokens for a given model."""
encoding = tiktoken.encoding_for_model(model)
return len(encoding.encode(text))
# Usage
prompt = "Explain quantum computing in simple terms."
tokens = count_tokens(prompt)
print(f"Token count: {tokens}") # Token count: 7
# For chat messages, account for message overhead
def count_chat_tokens(messages, model="gpt-4o"):
encoding = tiktoken.encoding_for_model(model)
tokens_per_message = 3 # every message has overhead
total = 0
for message in messages:
total += tokens_per_message
for key, value in message.items():
total += len(encoding.encode(value))
total += 3 # reply priming
return total
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is Python?"}
]
print(count_chat_tokens(messages)) # ~19 tokens
Install with pip install tiktoken. The library downloads the tokenizer vocabulary on first use and caches it locally.
JavaScript: Using js-tiktoken or Approximation
For browser or Node.js environments, you have two options: an exact count with js-tiktoken or a fast approximation.
Exact Count with js-tiktoken
import { encodingForModel } from "js-tiktoken";
const enc = encodingForModel("gpt-4o");
const tokens = enc.encode("Hello, how are you?");
console.log(tokens.length); // 6
Fast Approximation (No Dependencies)
If you don't need exact counts — for example, for a UI estimate — this approximation is within 5–10% for English text:
function estimateTokens(text) {
// English: ~4 chars per token on average
// Adjust for whitespace and punctuation
const charCount = text.length;
const wordCount = text.split(/\s+/).filter(Boolean).length;
return Math.ceil((charCount + wordCount) / 5);
}
console.log(estimateTokens("Hello, how are you?")); // ~5
This won't be accurate for code, non-English text, or text with lots of special characters. Use the exact library for anything billing-related.
Go: Using tiktoken-go
For Go services, the tiktoken-go package provides the same exact tokenization:
package main
import (
"fmt"
"github.com/pkoukk/tiktoken-go"
)
func countTokens(text, model string) (int, error) {
enc, err := tiktoken.EncodingForModel(model)
if err != nil {
return 0, err
}
tokens := enc.Encode(text, nil, nil)
return len(tokens), nil
}
func main() {
count, _ := countTokens("Hello, world!", "gpt-4o")
fmt.Printf("Tokens: %d\n", count) // Tokens: 4
}
When to Count Tokens
Count tokens at these key points in your application:
- Before sending a request: Verify the total (system prompt + user input + expected output) fits within the context window
- When building prompts dynamically: If you're injecting retrieved documents into a prompt, count as you add each chunk and stop before hitting the limit
- For cost estimation: Show users an estimated cost before they confirm an expensive operation
- In logging and monitoring: Track token usage per request to identify optimization opportunities
Handling the Context Window Budget
A practical pattern is to reserve space for the response and system prompt, then fill the remaining budget with user content:
MAX_CONTEXT = 128_000 # GPT-4o
RESERVED_OUTPUT = 4_096
SYSTEM_TOKENS = count_tokens(system_prompt)
available = MAX_CONTEXT - RESERVED_OUTPUT - SYSTEM_TOKENS
user_tokens = count_tokens(user_input)
if user_tokens > available:
# Truncate or chunk the input
user_input = truncate_to_tokens(user_input, available)
This ensures you never exceed the context window and always leave room for the model to respond.