freetokencounter.app
Prompt
Tokens appear here as you type. Each colored chunk is one token.
by freesuite.app

AI Token Counter — Count Tokens for GPT, Claude, Gemini, Llama and More

freetokencounter.app is a free, browser-based token counter that supports every major large language model: OpenAI's GPT-5.4, GPT-5.4 mini, GPT-5.3, GPT-5.2, GPT-5.1, GPT-5, GPT-5 Pro, GPT-4.1, o4-mini, o3, o3-pro and GPT-4o; Anthropic's Claude Opus 4.7, Sonnet 4.6, Sonnet 4.5 and Haiku 4.5; Google's Gemini 3.1 Pro Preview, 3 Flash Preview, 2.5 Pro and 2.5 Flash; Meta's Llama 4 Maverick, Scout and Behemoth; xAI's Grok 4, Grok 4 Heavy and Grok 4 Fast; Mistral's Medium 3, Magistral and Pixtral Large; DeepSeek V3.1 and R1; Cohere's Command A and Command R+; Alibaba's Qwen3-Max, Qwen3-Coder and QwQ-32B; Moonshot's Kimi K2 Thinking; Perplexity's Sonar Reasoning Pro and Deep Research; and MiniMax M1. Paste any text — a prompt, a code block, JSON, or non-Latin script — and freetokencounter.app shows the token count, characters, words, context-window usage, and cost estimate live, with a colored visualization of how the model splits your text. Nothing uploads to a server: the entire tool runs locally in your browser.

How does freetokencounter.app count tokens?

Every model uses a slightly different tokenizer, and most providers don't publish their exact tokenizer for in-browser use. freetokencounter.app combines the publicly documented GPT-2 / cl100k splitting regex with per-model calibration constants tuned against each provider's reference outputs. For English prose the result is accurate within roughly five percent — close enough to plan prompts, fit context windows, and budget API spend. The "Estimate" pill on every count makes the methodology transparent: there is no model where freetokencounter.app pretends to be an exact tokenizer when it isn't.

Why count tokens before sending a prompt?

Tokens are the unit AI models bill, the unit context windows are measured in, and the unit that determines whether a prompt will fit at all. Counting tokens before sending lets you avoid context-overflow errors, predict cost down to the cent, compare model efficiency for the same task, and right-size your prompts. freetokencounter.app shows tokens for every supported model side by side, so you can pick the cheapest model that handles your prompt without truncation.

Is freetokencounter.app free and private?

Yes — completely free, no sign-up, no rate limits, no API key required. Every keystroke is processed locally in your browser using JavaScript that ships with the page. Nothing about your prompt — not the text, not the count, not the model selection — is sent to any server. You can verify this in your browser's Network tab while typing: the only request is the initial page load. freetokencounter.app exists because we believe a token counter for sensitive prompts should never see your data.

How accurate is freetokencounter.app for each model?

For OpenAI models (GPT-5, GPT-4.1, GPT-4o, o-series, GPT-4, GPT-3.5) and Meta Llama models, the count is calibrated against the published reference tokenizers and is typically within ±3% on English text. For Anthropic Claude (Opus 4.7, Sonnet 4.6 and earlier), Google Gemini 2.5, xAI Grok, Cohere Command, Alibaba Qwen, DeepSeek, Mistral, Moonshot Kimi, and MiniMax — providers that don't release a fully open tokenizer — counts are within roughly ±5–8%. Code, JSON, and non-Latin scripts can drift more, generally toward higher token counts than the estimate. For high-stakes production budgeting, always validate against the provider's own count_tokens API; for prompt drafting and quick comparisons, freetokencounter.app is purpose-built.

AI Model Token Limits & Pricing Comparison

The table below compares context windows, max output tokens, and per-million-token pricing across major model families. Pricing reflects each provider's official rate cards.

Model Provider Context Input $/1M Output $/1M
GPT-5.4OpenAI1.1M$2.50$15.00
GPT-5.4 miniOpenAI1.1M$0.75$4.50
GPT-5.4 nanoOpenAI1.1M$0.20$1.25
GPT-5.4 ProOpenAI1.1M$30.00$180.00
GPT-5.3 CodexOpenAI400K$1.75$14.00
GPT-5.2OpenAI400K$0.875$7.00
GPT-5.2 ProOpenAI400K$10.50$84.00
GPT-5.1OpenAI400K$0.625$5.00
GPT-5OpenAI400K$1.25$10.00
GPT-5 ProOpenAI400K$15.00$120.00
GPT-5 miniOpenAI400K$0.25$2.00
GPT-5 nanoOpenAI400K$0.05$0.40
GPT-4.1OpenAI1M$2.00$8.00
o4-miniOpenAI200K$1.10$4.40
o3OpenAI200K$2.00$8.00
o3-proOpenAI200K$20.00$80.00
GPT-4oOpenAI128K$2.50$10.00
GPT-4o-miniOpenAI128K$0.15$0.60
o1OpenAI128K$15.00$60.00
Claude Opus 4.7Anthropic1M$15.00$75.00
Claude Sonnet 4.6Anthropic1M$3.00$15.00
Claude Sonnet 4.5Anthropic1M$3.00$15.00
Claude Haiku 4.5Anthropic200K$1.00$5.00
Claude Opus 4.1Anthropic200K$15.00$75.00
Claude Opus 4Anthropic200K$15.00$75.00
Claude Sonnet 4Anthropic200K$3.00$15.00
Claude 3.5 SonnetAnthropic200K$3.00$15.00
Gemini 3.1 Pro PreviewGoogle1M$2.00$12.00
Gemini 3.1 Flash-LiteGoogle1M$0.25$1.50
Gemini 3 Flash PreviewGoogle1M$0.50$3.00
Gemini 2.5 ProGoogle1M$1.25$10.00
Gemini 2.5 FlashGoogle1M$0.30$2.50
Gemini 2.0 FlashGoogle1M$0.10$0.40
Gemini 1.5 ProGoogle2M$1.25$5.00
Llama 4 MaverickMeta1M
Llama 4 ScoutMeta10M
Llama 4 BehemothMeta1M
Llama 3.3 70BMeta128K
Llama 3.1 405BMeta128K
Grok 4xAI256K$3.00$15.00
Grok 4 HeavyxAI256K$30.00$90.00
Grok 4 FastxAI2M$0.20$0.50
Grok Code Fast 1xAI256K$0.20$1.50
Grok 3xAI1M$2.00$10.00
Mistral Medium 3Mistral128K$0.40$2.00
Mistral Large 2Mistral128K$2.00$6.00
Magistral MediumMistral40K$2.00$5.00
Codestral 25.01Mistral256K$0.30$0.90
DeepSeek V3.1DeepSeek128K$0.27$1.10
DeepSeek R1DeepSeek128K$0.55$2.19
Command ACohere256K$2.50$10.00
Command R+Cohere128K$2.50$10.00
Command R7BCohere128K$0.0375$0.15
Qwen3-MaxAlibaba256K$1.60$6.40
Qwen3-CoderAlibaba256K
QwQ-32BAlibaba131K
Kimi K2Moonshot128K$0.60$2.50
Kimi K2 ThinkingMoonshot256K$0.60$2.50
Sonar ProPerplexity200K$3.00$15.00
Sonar Reasoning ProPerplexity200K$2.00$8.00
MiniMax-M1MiniMax1M$0.40$2.20

Pricing reflects each provider's published rate card. Open-weight Llama models have no first-party API; pricing varies by host (Together, Groq, Fireworks).

Why use freetokencounter.app?

Every major model
100+ models across 12 providers — OpenAI, Anthropic, Google, Meta, xAI, Mistral, DeepSeek, Cohere, Alibaba (Qwen), Moonshot, Perplexity, MiniMax. Switch instantly to compare.
Token visualization
See exactly how the model splits your text into tokens, with each token highlighted in a cycling pastel color. Hover to inspect.
Live cost estimate
Real-time input + output cost based on each provider's published rate card. Adjust expected output tokens to model your spend.
100% private
Everything runs in your browser. Prompts never reach a server. Safe for proprietary, regulated, or unreleased content.
Context-window meter
A live progress bar shows how much of the selected model's context window your prompt occupies, with warnings as you approach the limit.
No sign-up, no API key
Open the site, paste your prompt, get a count. No accounts, no rate limits, no upsells, no ads.

How token counting works

Split into chunks
freetokencounter.app applies the public GPT-2 splitting pattern to break your text on word boundaries, punctuation, and whitespace.
Apply model calibration
Per-model calibration constants adjust the chunk count to match each provider's tokenizer behavior on English prose.
Display + visualize
The final count, context-window usage, and cost estimate render live as you type. Each chunk is shown as a colored token in the visualization.

Frequently Asked Questions

What is a token in an AI model?

A token is the smallest unit of text an AI model processes. Tokens can be whole words, sub-words, or single characters depending on the tokenizer. As a rough rule, 1 token ≈ 4 characters or ¾ of an English word, but the exact count depends on the specific model. freetokencounter.app shows you the count for any major model in real time.

How accurate is freetokencounter.app?

Counts are estimates within roughly ±5% for English text on most models. The tool uses the public GPT-2 / cl100k splitting pattern combined with per-model calibration constants tuned against each provider's published tokenizer. Some providers (Anthropic, Google, MiniMax) do not publish a fully open tokenizer, so counts for those models are clearly labeled as approximations.

Which models does freetokencounter.app support?

freetokencounter.app supports OpenAI (GPT-5.4, 5.4 mini, 5.4 nano, 5.4 Pro, 5.3, 5.2, 5.2 Pro, 5.1, GPT-5, GPT-5 Pro, GPT-4.1, o4-mini, o3, o3-pro, GPT-4o, ChatGPT-4o), Anthropic (Claude Opus 4.7, Sonnet 4.6, Sonnet 4.5, Haiku 4.5, Opus 4.1), Google (Gemini 3.1 Pro Preview, 3.1 Flash-Lite, 3 Flash Preview, 2.5 Pro, 2.5 Flash), Meta (Llama 4 Maverick/Scout/Behemoth, Llama 3.x), xAI (Grok 4, 4 Heavy, 4 Fast, Code Fast 1, Grok 3), Mistral (Medium 3, Small 3.1, Magistral, Pixtral Large, Codestral 25.01), DeepSeek (V3.1, R1), Cohere (Command A, R+, R, R7B), Alibaba (Qwen3-Max, Qwen3-Coder, QwQ-32B), Moonshot (Kimi K2, K2 Thinking), Perplexity (Sonar, Sonar Pro, Sonar Reasoning Pro, Sonar Deep Research), MiniMax (M1, Text-01) — over 100 models across 12 providers.

Is my prompt uploaded anywhere?

No. freetokencounter.app runs entirely in your browser. Your prompt text never leaves your device — there are no servers, no analytics on input, no logging. You can verify this by checking the Network tab in your browser's developer tools while typing.

Why does the same text produce different token counts on different models?

Each model family uses a different tokenizer trained on different data with a different vocabulary. GPT-5 and GPT-4o use o200k_base (~200K vocab), Claude Opus 4.7 uses Anthropic's proprietary tokenizer, Gemini 2.5 Pro uses SentencePiece, Llama 4 uses its own BPE, and Grok 4 uses xAI's tokenizer. The same word may be one token in one model and three tokens in another. freetokencounter.app shows the difference side by side.

How is cost calculated?

The cost estimate multiplies your input token count by each model's published per-million input rate, plus an estimated output token count by the per-million output rate. Pricing data is sourced from each provider's official pricing page and updated periodically. Always check live pricing for production budgeting.

What is a context window?

The context window is the maximum number of tokens (input + output combined) a model can process in a single request. GPT-5 handles 400,000 tokens, Claude Opus 4.7 up to 1 million, Gemini 2.5 Pro 1 million, Llama 4 Scout up to 10 million. If your prompt plus expected output exceeds the context window, the model will reject the request or truncate. freetokencounter.app shows a live context-window meter for the selected model.

Can I count tokens for code or non-English text?

Yes. freetokencounter.app handles code, JSON, markdown, and any Unicode text including non-Latin scripts. Note that code and non-English text typically use 30–80% more tokens than equivalent English prose, because their characters fall outside the most common subword merges. The tool shows accurate counts for any text you paste.