Imagine you are trying to measure something very abstract, like "value." Usually, we think of value as how much money something costs, how good it feels, or how moral it is. This paper says: Stop. Let's strip all that away.

The authors propose a new way to measure value that is as hard and mathematical as physics. They argue that value is simply how fast a goal-directed agent (like a robot, a person, or an AI) turns a limited resource (like energy, time, or money) into progress toward a specific goal.

Here is the paper's theory, broken down into simple concepts using everyday analogies.

1. The "Value" Formula: The Logarithmic Law

The Concept:
If you have a bag of resources (say, 100 dollars) and you want to achieve a goal, how should you spend it? The paper argues that value doesn't grow in a straight line. The first dollar you spend is worth a lot; the hundredth dollar is worth much less. This is called "diminishing returns."

The Analogy:
Think of filling a bucket with a leaky hose. The first few gallons fill the bucket quickly. As the bucket gets fuller, the water level rises slower and slower.
The authors prove mathematically that the best way to measure this "progress" is using a logarithm (a specific type of math curve).

Why it matters: It means that spreading your resources too thin across many different goals is inefficient. Focusing your resources on the most important goals creates the most "value."

2. The "Speed Limit": Value is Bounded by Information

The Concept:
You cannot create value faster than you can understand the world. If you are driving a car blindfolded, no amount of gas (resource) will get you to your destination faster than if you had your eyes open.

The Analogy:
Imagine you are a gambler at a horse race.

The World: The horses running.
Your Goal: Picking the winner.
Your Resource: Your money.
Your Perception: Your ability to see which horse is winning.

The paper proves a "Coding Theorem of Value": Your maximum profit (value) is exactly equal to how much information you have about the race.
If you know nothing, you can't make money. If you have a perfect view of the race, you can make the maximum possible money. You cannot "create" value out of thin air; you can only convert information into value.

3. The "Second Law of Value": Misalignment is Waste

The Concept:
In physics, the Second Law of Thermodynamics says that energy is often lost as heat (waste) when you try to do work. The authors say the same thing happens with value.
If your model of the world is wrong, you waste resources.

The Analogy:
Imagine you are trying to hit a target with a bow and arrow.

Potential Value: The distance you could shoot if you were perfect.
Dissipation (Waste): The energy lost because you aimed at the wrong spot.
The Result: If you are confident but wrong, you don't just fail; you actively destroy value. The paper shows that "over-confidence" in an AI is a measurable form of waste, just like friction in a machine.

4. The "Fleet" Problem: How Groups Work Together

The Concept:
What happens when you have a whole team of agents (a "fleet") working together?
The paper argues that you can't just add up their values like adding numbers. Each agent has its own "frame of reference" (its own goals and view of the world).

The Analogy:
Think of a group of investors.

Value is Relative: One investor might value gold; another might value oil. You can't say who has "more" value without a common currency.
Price is the Bridge: The "price" of a resource (like a dollar) is the one thing everyone agrees on. It acts as a translator between their different goals.
The Fleet Ceiling: If the whole team pools their money and shares their eyes (information), they act like one giant super-agent. Their total success is limited by the total information the group has. If everyone sees the exact same thing, adding more people doesn't help. But if they see different things (diversity), the group can achieve more than any single person could alone.

5. The "Is" vs. "Ought" Gap

The Concept:
There is a fundamental difference between what is (facts about the world) and what ought to be (goals).

Beliefs (The "Is"): These can be learned. If you see the world is blue, you update your belief to "the world is blue."
Goals (The "Ought"): The world doesn't tell you what to want. You have to be told what to want by a designer.

The Analogy:
Imagine a GPS.

The GPS can easily learn the facts of the road (traffic, construction).
But the GPS cannot decide where you want to go. That is a goal.
The paper shows that if you try to fix a misaligned agent (one going the wrong way) just by "overseeing" it (forcing it to stop), it's inefficient. The better way is to change the incentives (the price) so that the path the agent wants to take is the same path you want it to take.

6. The Real-World Test

The authors didn't just write math; they tested it on real AI models (large language models).

The Test: They asked AIs to solve math problems, write code, and answer questions.
The Finding: They measured how much "information" the AI gathered from the question and how much "value" (correct answers) it produced.
The Result: The math held up perfectly. The AI's ability to produce value was directly tied to how much information it could extract from the question.
The Surprise: Bigger models (with more "brain power") didn't always do better. A smaller, more efficient model that understood the specific task better often created more "value" per unit of energy than a massive, confused model.

Summary

This paper treats Value not as a feeling or a price tag, but as a physical quantity, like energy.

Value is the rate of turning resources into goal-progress.
Information is the fuel that powers value. You can't get more value than your information allows.
Mistakes are waste. Being confidently wrong destroys value.
Groups work best when they share diverse information and trade resources at a fair price.
Alignment isn't about forcing agents to obey; it's about designing the "price" so that what the agent wants to do is exactly what you want it to do.

The authors conclude that this framework provides a solid, mathematical way to understand, measure, and govern populations of AI agents, moving beyond vague ideas of "good" or "bad" to precise calculations of efficiency and waste.

Technical Summary: A Mathematical Theory of Value

Problem Statement

The paper addresses the lack of a rigorous, structural definition of "value" for goal-directed agents. Current notions of value are entangled with semantics (morality, market price, psychology), making them difficult to measure or govern in populations of artificial agents. The authors propose stripping away these semantic layers to treat value as a lawful, structural quantity analogous to information in Shannon's theory. The core problem is to define value as a measurable rate of resource conversion to goal-progress, derive its mathematical limits, and establish a control theory for populations of agents operating under resource constraints.

Methodology

The authors employ a multi-pronged approach combining axiomatic derivation, information-theoretic proofs, dynamical systems analysis, and empirical validation on large language models (LLMs).

Axiomatic Derivation (Statics): The paper derives the functional form of value using three axioms: diminishing returns, additivity across independent channels, and scale invariance. Scale invariance is treated as a Cauchy functional equation, forcing a logarithmic measure.
Dynamic Derivation (Ergodicity): Independently, the paper derives the same logarithmic form using the ergodicity argument of Peters (2019) and Kelly (1956), showing that the long-run growth rate of a multiplicatively reinvested resource necessitates a log-measure.
Information-Theoretic Bounds: The authors map value creation to information theory, treating the agent's perception channel as a communication channel. They derive a "Coding Theorem of Value" bounding value growth by mutual information.
Dynamical Systems & Control: The paper models the evolution of beliefs, prices, and goals. It identifies an "is/ought asymmetry" where beliefs flow toward a world-supplied target (reality), while goals lack such a target, making them subject to control or selection.
Empirical Validation: The theory is tested on live language models (0.5B to 8B parameters) across five families and multiple task domains (classification, reasoning, sequential decision, code). The experiments are pre-registered, focusing on out-of-sample performance, mutual information tracking, and fleet-level pricing strategies.

Key Contributions

1. The Logarithmic Law of Value

The paper establishes that value $V$ is a logarithmic measure of resource allocation $e$ relative to goal weights $k$ :
$V(e) = \sum_{i} k_i \ln e_i$
This form is derived from two independent routes: static axioms (scale invariance) and dynamic compounding (ergodicity). The optimal allocation is proportional to goal weights, and the realized value is penalized by the entropy of the goal distribution.

2. The Coding Theorem of Value

The paper proves that the rate at which an agent can create value through a perception channel $Y$ about the world $X$ is bounded by the mutual information $I(X; Y)$ .
$\Delta G \le I(X; Y)$
This bound is achieved by Bayes-proportional allocation. Consequently, an agent cannot create value faster than it can perceive the world; perception capacity is the hard ceiling on value-generation rate.

3. The Second Law of Value

Realized value is decomposed into available potential minus dissipation caused by misalignment (model error):
$G = D(q \parallel r) - D(q \parallel p)$
Where $q$ is reality, $p$ is the agent's model, and $r$ is the baseline. Misalignment ( $D(q \parallel p)$ ) acts as non-negative dissipation. Confident error (high certainty in a wrong model) drives realized growth negative, actively destroying value.

4. Multi-Agent Dynamics: Price and Fleet Capacity

Frame-Relative Value, Frame-Independent Price: Individual values are not cardinally comparable across agents with different goals. However, agents coordinate via a shadow price $\lambda$ (marginal value of resource), which is frame-independent.
Fleet Capacity: A fleet of agents acting as a single decision-maker (pooling resources and fusing perception) has a value growth ceiling bounded by the mutual information of the fused channel: $G_{fleet} \le I(X; Y_{1:m})$ . Redundant agents add zero to this ceiling; diversity lifts it toward the world entropy $H(X)$ .
Operating Point: The fleet operates as a Kelly portfolio over agents, with resource weights selected by the emergent price.

5. Alignment as a Stability Condition

The paper frames alignment as a control-stability problem. Because goals have no world-supplied target (the "is/ought" asymmetry), they are governed by selection (what pays) and control (oversight). The residual misalignment is given by:
$\|\bar{k}^* - k^*\| = \gamma^{-1} \|V g\|$
Where $V$ is goal dispersion and $g$ is the reward gradient. The authors argue that incentive design (aligning $g$ with $k^*$ ) is the "cheap half" of alignment, superior to brute-force oversight (increasing $\gamma$ ).

Empirical Results

The authors conducted pre-registered tests on live language models:

Value-Throughput vs. Capability: Mutual information $I(X; Y)$ tracks realized capability (tool accuracy) with high precision (Spearman $\rho = 0.977$ ) across 30 model-domain points, outperforming parameter count as a predictor.
Out-of-Sample Tracking: Realized value growth $\Delta G$ tracks mutual information $I(X; Y)$ with a slope of $\approx 0.95$ , confirming the coding theorem.
Dissipation: Over-confidence in weak models results in measurable dissipation, driving realized growth negative, consistent with the Second Law.
Fleet Pricing: A value-based pricing mechanism (routing based on $I/cost$) beats cost-blind routers and matches cost-aware hand-tuned baselines under compute constraints. It does not outperform the single best agent in homogeneous, positively-correlated fleets, confirming the theory's scoping rules.
Generalization: The relationship $\Delta G \sim I(X; Y)$ holds across four distinct task shapes (classification, reasoning, sequential, code), promoting the "bridge" from a demonstration to a potential law.

Significance and Claims

The paper claims to unify disparate fields—Kelly betting, thermodynamics of computation, general equilibrium theory, and control theory—under a single substrate-grounded quantity: value.

Unification: It demonstrates that Shannon's quantities (entropy, divergence, mutual information, Fisher metric) naturally reappear in the theory of value, suggesting the abstraction is structural rather than decorative.
Governance: It provides a control theory for agent populations, distinguishing between value (frame-relative) and price (frame-independent). It argues that governing fleets requires managing perception diversity and pricing resources, rather than summing utilities.
Modesty: The authors explicitly state that the underlying mechanisms (Kelly, Arrow-Debreu, control theory) are not new. The contribution lies in their unification and the derivation of specific governance implications (e.g., incentive design over oversight).
Status: The paper characterizes itself as a "synthesis" rather than a "discovery." It acknowledges that the most distinctively unified predictions (e.g., coupled fleet capacity regions, dynamic goal evolution) remain untestable with current instruments. The work is presented as a falsifiable program gated on future controlled multi-agent experiments.

The paper concludes that value is a frame-relative structural quantity with universal laws relating frames, now testable on the artificial agents the theory was designed to govern.

A Mathematical Theory of Value: a synthesis on goal-directed agency under resource constraints