Menu Pricing of Large Language Models

This paper develops a theoretical framework demonstrating that optimal pricing for Large Language Models reduces to one-dimensional screening via committed-spend token budgets priced at marginal cost, a mechanism that aligns with observed industry practices and adapts to competitive dynamics between proprietary and open-source models.

Dirk Bergemann, Alessandro Bonatti, Alex Smolin

Published 2026-03-10
📖 6 min read🧠 Deep dive

Imagine you are running a massive, high-tech bakery. But instead of selling loaves of bread, you sell "Brain Power" (Large Language Models, or LLMs).

Your customers are businesses and individuals who want to use your brain power for all sorts of things: writing emails, coding software, analyzing legal documents, or generating art. The problem is, every customer is different. Some need a tiny spark of intelligence for a quick task; others need a roaring fire of computation for a massive project. And you don't know exactly what they are going to do with your power until they start using it.

This paper by Bergemann, Bonatti, and Smolin is a guide on how to price your "Brain Power" so you make the most money without scaring customers away.

Here is the breakdown using simple analogies:

1. The Problem: A Messy Kitchen

Usually, pricing is hard because customers are complicated.

  • The Mystery: You don't know if a customer is a "light user" (just checking the weather) or a "heavy user" (training a robot army).
  • The Hidden Action: Even if you sell them a "token bucket" (a bucket of brain power), you can't see how they pour it. Do they use it on easy tasks or hard ones?
  • The Math Nightmare: In theory, this is a math problem with infinite dimensions. It's like trying to price a car when you don't know if the buyer wants to drive it to the grocery store or race it in the Indy 500, and you can't see which gears they shift into.

2. The Big Discovery: The "Magic Summary"

The authors found a magic trick. Because the way LLMs work is consistent (mathematically "homogeneous"), you don't need to know the details of every single task a customer does.

The Analogy: Imagine every customer has a "Brain Power Score."

  • One customer might do 1,000 easy tasks.
  • Another might do 10 very hard tasks.
  • If their total value is the same, they act exactly the same to you.

You can ignore the messy details and just look at this single Score. This turns a terrifying, infinite-dimensional math problem into a simple, one-dimensional problem: "How much does this customer's Score tell us they are willing to pay?"

3. The Solution: The "Spending Pass" (Token Budgets)

So, how do you sell this? You don't sell "100 words of text." You sell a Spending Pass.

  • The Mechanism: You give the customer a budget of "credits" (tokens).
  • The Price: You charge them a flat fee upfront for the budget.
  • The Twist: Inside that budget, they can spend the credits however they want.
    • Using a "smart" model (like a Ferrari) costs more credits per mile.
    • Using a "basic" model (like a sedan) costs fewer credits.
    • They can mix and match.

Why this works: It's like giving a customer a prepaid card for a theme park. You charge them $100 for the card. They can ride the gentle carousel 50 times or the scary rollercoaster 5 times. You don't care what they ride; you just know that the person with the "high score" (the thrill-seeker) will buy the big card, and the "low score" person will buy the small one.

4. The Three Real-World Strategies

The paper shows that the big AI companies (like OpenAI and Anthropic) are already doing this, but in slightly different ways:

A. The "Anthropic" Style (Quantity Only)

  • The Setup: Everyone gets access to the same models (Ferraris and Sedans).
  • The Difference: The "Pro" plan gives you a bigger bucket of tokens. The "Max" plan gives you a giant bucket.
  • The Logic: You are screening based on how much they want to use, not which model they use.

B. The "OpenAI" Style (Quality + Quantity)

  • The Setup: The "Free" plan gets the basic sedan. The "Plus" plan gets the sedan + a few rides on the Ferrari. The "Pro" plan gets unlimited access to the Ferrari.
  • The Logic: You are screening based on both how much they use and how smart the model needs to be. High-value customers get the best tools and more of them.

C. The "GitHub/Quora" Style (The Aggregator)

  • The Setup: These companies don't make their own models; they resell others'. They sell a "Credit Pack."
  • The Logic:
    • Poe (Quora): You buy 12 million points. If you run out, you stop. (Maximum Spend).
    • GitHub Copilot: You buy a plan with a limit, but if you go over, you can keep going at a higher price. (Minimum Spend).
    • Both are just different ways of managing that "Spending Pass."

5. The "API" Exception: The Grocery Store

There is one place where this fancy pricing doesn't happen: Developer APIs (where programmers build apps).

  • The Price: It's strictly linear. $0.0001 per token. No bundles, no discounts.
  • Why? The market is too competitive. If OpenAI tried to sell a complex "Spending Pass" to a developer, that developer would just go to Google or Anthropic who offers a simple "pay-as-you-go" price.
  • The Paper's Take: This is actually the "efficient" price. It's like a grocery store selling flour by the pound. No tricks, just the cost of the flour plus a tiny bit of profit.

6. The Competition Twist

What happens if a "Big Guy" (Proprietary Leader) competes with a "Small Guy" (Open Source)?

  • The Small Guy: Sells tokens at the bare minimum cost (marginal cost).
  • The Big Guy: Has to be clever.
    • Low-end customers: They just buy from the Small Guy. The Big Guy ignores them.
    • Middle customers: The Big Guy sells them just enough tokens so they don't feel the need to buy more from the Small Guy. It's a "deterrence" zone.
    • High-end customers: The Big Guy sells them a premium package, ignoring the Small Guy because these customers want the Big Guy's special features anyway.

The Bottom Line

The paper argues that the confusing pricing we see today (subscriptions, token limits, credit systems) isn't random. It's actually the mathematically perfect way to sell AI.

By selling budgets instead of specific tasks, companies can:

  1. Hide the complexity of the technology from the user.
  2. Automatically sort customers by how much they are willing to pay.
  3. Make the most profit while keeping the system efficient.

It turns out, the best way to sell "Brain Power" is to sell a wallet of credits and let the customer decide how to spend it.