The other day I got a silly scare: I opened the AI API bill for a little project and it was well above what I expected. I dug in and the cause was the dumbest possible — I picked the model out of habit. I used the "flagship" for everything, including simple tasks a model 100× cheaper would handle just as well. I'd never stopped to compare.
When I looked at the numbers, the gap was absurd. The same prompt, same task, cost about US$0.0007 on an open model and US$0.08 on the flagship — over 100× the difference. Multiply that by thousands of calls a month and "wrong model" becomes real money. That's when I built PromptTools.
The problem: we pick a model by vibe, not cost
Anyone working with LLM APIs knows the scene. The holes are always these:
- Model by habit. You use what you're used to (or the "smartest") for everything — even to classify a short text, where a cheap model gives the same result.
- Tokens are invisible. You don't "see" the size of the prompt. A fat system prompt, a whole context pasted in, and the per-call cost inflates without you noticing.
- Input ≠ output. The price of what goes in differs from what comes out, and output is usually the pricier one. Ignoring that gets the math badly wrong.
- The surprise only shows at scale. US$0.08 per call seems like nothing. Times 50k calls/month it becomes a bill that hurts — and by then it's too late.
The core problem: the decision of which model to use almost never goes through a side-by-side cost comparison. The ruler is missing.
The solution: PromptTools, with a side-by-side comparison
PromptTools is the ruler. You paste the prompt and it shows you, instantly: how many tokens it has, what it costs (input + output + multimodal) and — the trick — the cost of the same prompt across every model, side by side, cheapest to priciest. One glance and you see you can swap the "flagship" for a model 20× cheaper without losing quality on that task.
It's free, no login, and runs 100% in the browser — your prompt (often your secret) never leaves the machine.
What it gives you
- Model comparison. The same prompt across GPT-5, Claude Opus 4.8, Gemini 3, DeepSeek, Llama & co., sorted cheapest to priciest, with the "how many times pricier" for each.
- Scale projection. Enter how many requests per month and see the estimated monthly cost — where the wrong-model mistake actually shows.
- Context: chat vs API. A bar shows whether your prompt + answer fits the chat's safe limit or only the API (and warns when it overflows, or when Gemini hits the doubled-price tier).
- Density, templates and PDF. A prompt "density" gauge, templates saved in the browser, and a cost-report export to send a client.
Prices are reference estimates (USD) and easy to update — always check the provider's official table before finalizing a quote.
FAQ
How does it estimate tokens? By the rule of thumb ≈ chars ÷ 3.5 (good for budgeting; the exact figure depends on each model's tokenizer). Image and audio count when you enter them.
Are the prices always right? They're reference estimates (USD) in an easy-to-update file — check the provider's official table, prices change.
Is my prompt sent to a server? No — it runs 100% in the browser; templates stay in your localStorage.
Who is it for? Devs and builders using LLM APIs who want to pick the right model by cost, project monthly spend and audit context.