Unusual $/MTok numbers
The price of an LLM is usually quoted in dollars per million tokens, ie: $/MTok. I maintain a list of some prominent LLMs and their costs here.
Usually, LLM spend dominates your costs, but if you maintain LLM infra as your business, or as you move to cheaper LLMs for your product, it’s worth asking:
- “How do other non-LLM costs scale with MTok processed?”
- “What % of my total spend should I expect to be on non-LLM costs? Are they still negligible?”
In this document, I’ll attempt to convert various other costs from storage, to vector DBs, to proxying into $/MTok, to make comparisons easy.
Storage & Egress
An MTok is 1 million tokens, each of which are ~5 bytes. Hence storing an MTok takes about 5 megabytes. Thus, 200 MTok = 1 GB.
These numbers can get a bit small to deal with, so we’re often going to use GTok. 1 GTok = 1000 MTok, 1 GTok = 0.2 GB.
If you’re passing all your LLM queries through an AWS EC2 instance, and incurring the egress fee, you’re paying:
- $0.09/GB → $0.45/GTok
If you store your LLM queries in a bucket, you’re going to pay:
- $0.023/GB-mo on AWS S3 → $0.12/GTok-mo
- $0.015/GB-mo on Cloudflare R2 → $0.075/GTok-mo
- $0.005/GB-mo on Backblaze B2 → $0.025/GTok-mo
If you store them in a fast DB for querying, you might run into document size limits, but you’d pay:
- $0.125/GB-mo on Supabase → $0.63/GTok-mo
- $0.15/GB-mo on Firestore → $0.75/GTok-mo
- $0.1/GB-mo on Aurora → $0.5/GTok-mo
Search storage & vectors
Cases where you’re making all LLM inputs and outputs searchable are bit contrived - more often you’re indexing a corpus as an LLM input. Regardless, it’s helpful to bring these prices into the same units.
Elasticsearch recommends 1 GB of RAM for every ~30-45 of disk, and based on their pricing calculator, at scale, you see ~$1.24/TB-hr of disk with some replication. That’s (1.24 * 730)/1000 = $0.9/GB-mo of disk.
Hence, storing all LLM inputs + outputs on an Elastic cluster costs around:
- $0.9/GB-mo → $4.5/GTok-mo
For vectorization, embedding prices are often natively in $/MTok:
- OpenAI text-embedding-3-small: $0.02/MTok
- OpenAI text-embedding-3-large: $0.13/MTok
- bge-large-en-v1.5 via Deepinfra: $0.01/MTok
These models take 512 to 8191 tokens as input, and output 512 to 3072 dimensions.
Next, let’s look at vector storage. You can assume you’re on average converting N input tokens to N dimensions, hence 1 MDim = 1 MTok. Dimensions are usually 4 bytes, hence 1 MDim = 4 MB, and 1 GDim = 4 GB.
If you stored your LLM queries in Turbopuffer, you pay:
- For storage: $0.33/GB-mo → $1.3/GDim-mo → $1.3/GTok-mo
- For insertion: $2/GB → $8/GDim → $8/GTok
If you use Weaviate serverless, you pay:
- For storage: $0.1/MDim-mo → $100/GDim-mo → $100/GTok-mo
- (But reads & writes may be free?)
Serverless proxies
You often incur the costs of a serverless runner while you’re proxying your LLM call through it. It’s a little tricky to measure as aggregate tokens/s can vary based on request characteristics. I’m going to go with a baseline of 20 tok/s, running on a 512 MB runner.
MToks of input should be treated as ~free, but waiting on an MTok of output then would take about 50,000 seconds, for a total of 25,000 GB-seconds. Thus a GTok is 25 million GB-seconds, and you’d pay:
- Vercel functions: $0.18/GB-hr → $7/MTok
- Vercel edge functions: ~0 - you’re not billed for non-compute time
- Cloudflare workers: ~0 - you’re not billed for non-compute time
- AWS Lambda: $8.3/Msecond on a 512 MB runner → $0.415/MTok
Summary
Here’s a table of rounded figures:
Service | Costs (approx) |
Blob storage | $0.1/GTok-mo |
DB storage | $0.5/GTok-mo |
Egress fees | $0.5/GTok |
Turbopuffer vector storage | $1/GTok-mo |
Elastic search storage | $5/GTok-mo |
Embedding Generation | $20/GTok |
Cheapest LLMs (gemini-flash-8b, llama-8b) | $50/GTok in, $200/GTok out |
Weaviate vector storage | $100/GTok-mo |
Cheap LLMs (Deepseek-v3, gpt-4o-mini) | $200/GTok in, $500/GTok out |
Proxying billed by wall-time | $1000+/GTok out |
Mainstay LLMs (claude-3.5-sonnet, gpt-4o) | $5000/GTok in, $15000/GTok out |
Some findings:
- You really don’t want to be paying for wall-clock time while waiting for a LLM response, make sure you pick a proxy that idles well.
- Egress fees are probably 0.1-0.3% of your costs if you’re using a cheap LLM, only rising to 1% with the cheapest of LLMs.
- Storing all request data for a year will cost 1-5% of your spend with most fast DBs and cheap LLMs, and 0.2-1% with blob storage.
- Turbopuffer is almost 2 OOMs cheaper than Weaviate serverless (!!)
If you generally avoid using cheap LLMs altogether, most of this math becomes insignificant.
Appendix: GB over GTok
An alternate framing is: MTok is a weird unit, why don’t we spend our time thinking in GB instead? ie: LLM billing can be approximated as dollars per GB of input / generated output instead, and many other costs are natively in $/GB or $/GB-mo. This feels like a more correct unit for making it easier to guesstimate what your dominating component of spend will be.
Thus, an alternate version of the table in GB instead is as follows:
Service | Costs (approx) |
Blob storage | $0.02/GB-mo |
DB storage | $0.1/GB-mo |
Egress fees | $0.1/GB |
Turbopuffer vector storage | $0.2/GB-mo |
Elastic search storage | $1/GB-mo |
Embedding Generation | $5/GB |
Cheapest LLMs (gemini-flash-8b, llama-8b) | $10/GB in, $50/GB out |
Weaviate vector storage | $20/GB-mo |
Cheap LLMs (Deepseek-v3, gpt-4o-mini) | $50/GB in, $100/GTok out |
Proxying billed by wall-time | $200+/GB out |
Mainstay LLMs (claude-3.5-sonnet, gpt-4o) | $1000/GB in, $3000/GB out |
LLMs cost $10-$1000/GB of data they process and more for every GB they generate.