AI Token Counter — Count Tokens for GPT, Claude, Gemini, Llama and More

freetokencounter.app is a free, browser-based token counter that supports every major large language model: OpenAI's GPT-5.4, GPT-5.4 mini, GPT-5.3, GPT-5.2, GPT-5.1, GPT-5, GPT-5 Pro, GPT-4.1, o4-mini, o3, o3-pro and GPT-4o; Anthropic's Claude Opus 4.7, Sonnet 4.6, Sonnet 4.5 and Haiku 4.5; Google's Gemini 3.1 Pro Preview, 3 Flash Preview, 2.5 Pro and 2.5 Flash; Meta's Llama 4 Maverick, Scout and Behemoth; xAI's Grok 4, Grok 4 Heavy and Grok 4 Fast; Mistral's Medium 3, Magistral and Pixtral Large; DeepSeek V3.1 and R1; Cohere's Command A and Command R+; Alibaba's Qwen3-Max, Qwen3-Coder and QwQ-32B; Moonshot's Kimi K2 Thinking; Perplexity's Sonar Reasoning Pro and Deep Research; and MiniMax M1. Paste any text — a prompt, a code block, JSON, or non-Latin script — and freetokencounter.app shows the token count, characters, words, context-window usage, and cost estimate live, with a colored visualization of how the model splits your text. Nothing uploads to a server: the entire tool runs locally in your browser.

How does freetokencounter.app count tokens?

Every model uses a slightly different tokenizer, and most providers don't publish their exact tokenizer for in-browser use. freetokencounter.app combines the publicly documented GPT-2 / cl100k splitting regex with per-model calibration constants tuned against each provider's reference outputs. For English prose the result is accurate within roughly five percent — close enough to plan prompts, fit context windows, and budget API spend. The "Estimate" pill on every count makes the methodology transparent: there is no model where freetokencounter.app pretends to be an exact tokenizer when it isn't.

Why count tokens before sending a prompt?

Tokens are the unit AI models bill, the unit context windows are measured in, and the unit that determines whether a prompt will fit at all. Counting tokens before sending lets you avoid context-overflow errors, predict cost down to the cent, compare model efficiency for the same task, and right-size your prompts. freetokencounter.app shows tokens for every supported model side by side, so you can pick the cheapest model that handles your prompt without truncation.

Is freetokencounter.app free and private?

Yes — completely free, no sign-up, no rate limits, no API key required. Every keystroke is processed locally in your browser using JavaScript that ships with the page. Nothing about your prompt — not the text, not the count, not the model selection — is sent to any server. You can verify this in your browser's Network tab while typing: the only request is the initial page load. freetokencounter.app exists because we believe a token counter for sensitive prompts should never see your data.

How accurate is freetokencounter.app for each model?

For OpenAI models (GPT-5, GPT-4.1, GPT-4o, o-series, GPT-4, GPT-3.5) and Meta Llama models, the count is calibrated against the published reference tokenizers and is typically within ±3% on English text. For Anthropic Claude (Opus 4.7, Sonnet 4.6 and earlier), Google Gemini 2.5, xAI Grok, Cohere Command, Alibaba Qwen, DeepSeek, Mistral, Moonshot Kimi, and MiniMax — providers that don't release a fully open tokenizer — counts are within roughly ±5–8%. Code, JSON, and non-Latin scripts can drift more, generally toward higher token counts than the estimate. For high-stakes production budgeting, always validate against the provider's own count_tokens API; for prompt drafting and quick comparisons, freetokencounter.app is purpose-built.

AI Model Token Limits & Pricing Comparison

The table below compares context windows, max output tokens, and per-million-token pricing across major model families. Pricing reflects each provider's official rate cards.

Model	Provider	Context	Input $/1M	Output $/1M
GPT-5.4	OpenAI	1.1M	$2.50	$15.00
GPT-5.4 mini	OpenAI	1.1M	$0.75	$4.50
GPT-5.4 nano	OpenAI	1.1M	$0.20	$1.25
GPT-5.4 Pro	OpenAI	1.1M	$30.00	$180.00
GPT-5.3 Codex	OpenAI	400K	$1.75	$14.00
GPT-5.2	OpenAI	400K	$0.875	$7.00
GPT-5.2 Pro	OpenAI	400K	$10.50	$84.00
GPT-5.1	OpenAI	400K	$0.625	$5.00
GPT-5	OpenAI	400K	$1.25	$10.00
GPT-5 Pro	OpenAI	400K	$15.00	$120.00
GPT-5 mini	OpenAI	400K	$0.25	$2.00
GPT-5 nano	OpenAI	400K	$0.05	$0.40
GPT-4.1	OpenAI	1M	$2.00	$8.00
o4-mini	OpenAI	200K	$1.10	$4.40
o3	OpenAI	200K	$2.00	$8.00
o3-pro	OpenAI	200K	$20.00	$80.00
GPT-4o	OpenAI	128K	$2.50	$10.00
GPT-4o-mini	OpenAI	128K	$0.15	$0.60
o1	OpenAI	128K	$15.00	$60.00
Claude Opus 4.7	Anthropic	1M	$15.00	$75.00
Claude Sonnet 4.6	Anthropic	1M	$3.00	$15.00
Claude Sonnet 4.5	Anthropic	1M	$3.00	$15.00
Claude Haiku 4.5	Anthropic	200K	$1.00	$5.00
Claude Opus 4.1	Anthropic	200K	$15.00	$75.00
Claude Opus 4	Anthropic	200K	$15.00	$75.00
Claude Sonnet 4	Anthropic	200K	$3.00	$15.00
Claude 3.5 Sonnet	Anthropic	200K	$3.00	$15.00
Gemini 3.1 Pro Preview	Google	1M	$2.00	$12.00
Gemini 3.1 Flash-Lite	Google	1M	$0.25	$1.50
Gemini 3 Flash Preview	Google	1M	$0.50	$3.00
Gemini 2.5 Pro	Google	1M	$1.25	$10.00
Gemini 2.5 Flash	Google	1M	$0.30	$2.50
Gemini 2.0 Flash	Google	1M	$0.10	$0.40
Gemini 1.5 Pro	Google	2M	$1.25	$5.00
Llama 4 Maverick	Meta	1M	—	—
Llama 4 Scout	Meta	10M	—	—
Llama 4 Behemoth	Meta	1M	—	—
Llama 3.3 70B	Meta	128K	—	—
Llama 3.1 405B	Meta	128K	—	—
Grok 4	xAI	256K	$3.00	$15.00
Grok 4 Heavy	xAI	256K	$30.00	$90.00
Grok 4 Fast	xAI	2M	$0.20	$0.50
Grok Code Fast 1	xAI	256K	$0.20	$1.50
Grok 3	xAI	1M	$2.00	$10.00
Mistral Medium 3	Mistral	128K	$0.40	$2.00
Mistral Large 2	Mistral	128K	$2.00	$6.00
Magistral Medium	Mistral	40K	$2.00	$5.00
Codestral 25.01	Mistral	256K	$0.30	$0.90
DeepSeek V3.1	DeepSeek	128K	$0.27	$1.10
DeepSeek R1	DeepSeek	128K	$0.55	$2.19
Command A	Cohere	256K	$2.50	$10.00
Command R+	Cohere	128K	$2.50	$10.00
Command R7B	Cohere	128K	$0.0375	$0.15
Qwen3-Max	Alibaba	256K	$1.60	$6.40
Qwen3-Coder	Alibaba	256K	—	—
QwQ-32B	Alibaba	131K	—	—
Kimi K2	Moonshot	128K	$0.60	$2.50
Kimi K2 Thinking	Moonshot	256K	$0.60	$2.50
Sonar Pro	Perplexity	200K	$3.00	$15.00
Sonar Reasoning Pro	Perplexity	200K	$2.00	$8.00
MiniMax-M1	MiniMax	1M	$0.40	$2.20

Pricing reflects each provider's published rate card. Open-weight Llama models have no first-party API; pricing varies by host (Together, Groq, Fireworks).

Why use freetokencounter.app?

Every major model

100+ models across 12 providers — OpenAI, Anthropic, Google, Meta, xAI, Mistral, DeepSeek, Cohere, Alibaba (Qwen), Moonshot, Perplexity, MiniMax. Switch instantly to compare.

Token visualization

See exactly how the model splits your text into tokens, with each token highlighted in a cycling pastel color. Hover to inspect.

Live cost estimate

Real-time input + output cost based on each provider's published rate card. Adjust expected output tokens to model your spend.

100% private

Everything runs in your browser. Prompts never reach a server. Safe for proprietary, regulated, or unreleased content.

Context-window meter

A live progress bar shows how much of the selected model's context window your prompt occupies, with warnings as you approach the limit.

No sign-up, no API key

Open the site, paste your prompt, get a count. No accounts, no rate limits, no upsells, no ads.

How token counting works

Split into chunks

freetokencounter.app applies the public GPT-2 splitting pattern to break your text on word boundaries, punctuation, and whitespace.

Apply model calibration

Per-model calibration constants adjust the chunk count to match each provider's tokenizer behavior on English prose.

Display + visualize

The final count, context-window usage, and cost estimate render live as you type. Each chunk is shown as a colored token in the visualization.

Frequently Asked Questions

What is a token in an AI model?

A token is the smallest unit of text an AI model processes. Tokens can be whole words, sub-words, or single characters depending on the tokenizer. As a rough rule, 1 token ≈ 4 characters or ¾ of an English word, but the exact count depends on the specific model. freetokencounter.app shows you the count for any major model in real time.

How accurate is freetokencounter.app?

Counts are estimates within roughly ±5% for English text on most models. The tool uses the public GPT-2 / cl100k splitting pattern combined with per-model calibration constants tuned against each provider's published tokenizer. Some providers (Anthropic, Google, MiniMax) do not publish a fully open tokenizer, so counts for those models are clearly labeled as approximations.

Which models does freetokencounter.app support?

freetokencounter.app supports OpenAI (GPT-5.4, 5.4 mini, 5.4 nano, 5.4 Pro, 5.3, 5.2, 5.2 Pro, 5.1, GPT-5, GPT-5 Pro, GPT-4.1, o4-mini, o3, o3-pro, GPT-4o, ChatGPT-4o), Anthropic (Claude Opus 4.7, Sonnet 4.6, Sonnet 4.5, Haiku 4.5, Opus 4.1), Google (Gemini 3.1 Pro Preview, 3.1 Flash-Lite, 3 Flash Preview, 2.5 Pro, 2.5 Flash), Meta (Llama 4 Maverick/Scout/Behemoth, Llama 3.x), xAI (Grok 4, 4 Heavy, 4 Fast, Code Fast 1, Grok 3), Mistral (Medium 3, Small 3.1, Magistral, Pixtral Large, Codestral 25.01), DeepSeek (V3.1, R1), Cohere (Command A, R+, R, R7B), Alibaba (Qwen3-Max, Qwen3-Coder, QwQ-32B), Moonshot (Kimi K2, K2 Thinking), Perplexity (Sonar, Sonar Pro, Sonar Reasoning Pro, Sonar Deep Research), MiniMax (M1, Text-01) — over 100 models across 12 providers.

Is my prompt uploaded anywhere?

No. freetokencounter.app runs entirely in your browser. Your prompt text never leaves your device — there are no servers, no analytics on input, no logging. You can verify this by checking the Network tab in your browser's developer tools while typing.

Why does the same text produce different token counts on different models?

Each model family uses a different tokenizer trained on different data with a different vocabulary. GPT-5 and GPT-4o use o200k_base (~200K vocab), Claude Opus 4.7 uses Anthropic's proprietary tokenizer, Gemini 2.5 Pro uses SentencePiece, Llama 4 uses its own BPE, and Grok 4 uses xAI's tokenizer. The same word may be one token in one model and three tokens in another. freetokencounter.app shows the difference side by side.

How is cost calculated?

The cost estimate multiplies your input token count by each model's published per-million input rate, plus an estimated output token count by the per-million output rate. Pricing data is sourced from each provider's official pricing page and updated periodically. Always check live pricing for production budgeting.

What is a context window?

The context window is the maximum number of tokens (input + output combined) a model can process in a single request. GPT-5 handles 400,000 tokens, Claude Opus 4.7 up to 1 million, Gemini 2.5 Pro 1 million, Llama 4 Scout up to 10 million. If your prompt plus expected output exceeds the context window, the model will reject the request or truncate. freetokencounter.app shows a live context-window meter for the selected model.

Can I count tokens for code or non-English text?

Yes. freetokencounter.app handles code, JSON, markdown, and any Unicode text including non-Latin scripts. Note that code and non-English text typically use 30–80% more tokens than equivalent English prose, because their characters fall outside the most common subword merges. The tool shows accurate counts for any text you paste.