Robust Testing Of the Allais Paradox By Paired Choices vs. Paired Valuations

Imagine you are a detective trying to solve a mystery about human behavior. The mystery is the Allais Paradox, a famous puzzle where people often make choices that seem to break the rules of "rational" math.

Specifically, there's a pattern called the Common Ratio Effect. It's like this:

Scenario A: You can take $100 for sure, or gamble for a chance at $150. Most people pick the sure $100.
Scenario B: Now, imagine the odds are cut in half. You can take $50 for sure, or gamble for a chance at $150. Suddenly, many people switch and pick the gamble.

Mathematically, if you liked the sure thing in the first scenario, you should like it in the second. But people flip-flop. This suggests our brains don't work like simple calculators.

For decades, economists have used Choice Tests (asking people to pick Option A or B) to prove this happens. But recently, a new group of researchers (MNOSS) argued: "Wait a minute! Maybe people aren't actually flipping their preferences. Maybe they just make random mistakes when they choose. If we account for those mistakes, the 'paradox' disappears."

To prove their point, MNOSS stopped asking people to choose and started asking them to value things (e.g., "What is the lowest amount of cash you would accept instead of this gamble?"). Using these "Valuation Tests," they claimed the paradox was an illusion and that people are actually rational after all.

Enter Echenique and Tserenjigmid (the authors of this paper). They are the detectives who say: "Hold on. You changed the tool, but you might have broken the tool you used to measure it."

Here is the breakdown of their argument using simple analogies:

1. The "Noisy Scale" Problem (Why the old tests were criticized)

Imagine you are weighing two bags of apples.

The Weak Test: You look at the scale. If Bag A is heavier than Bag B, you pick A.
The Criticism: MNOSS argued that if the scale is "noisy" (shaky), sometimes it might show Bag A is heavier, and sometimes Bag B, even if they are the same weight. They claimed that if you just look at the frequency of choices, the noise makes it look like people are flipping their preferences (the paradox), even if they aren't.

2. The "Valuation" Trap (Why the new tests are flawed)

MNOSS said, "Let's stop weighing the bags and just ask people, 'How much money is this bag worth to you?'"

The Authors' Counter: This is like asking someone to guess the weight of a bag of apples while they are drunk.
- The "Mean" Problem: If you ask for an average value, the answer depends heavily on how "risk-averse" (scared of losing) the person is. It's like asking a tightrope walker and a trapeze artist how much a rope is worth; their answers will be wildly different based on their personality, not the rope's actual weight. The math shows you can get any answer you want just by tweaking the person's risk personality.
- The "Sign" Problem: If you just ask, "Is it worth more than $10?", it only works if the person's "drunkenness" (random errors) is perfectly symmetrical. If their mistakes lean one way, the test is broken.

The Analogy: MNOSS tried to fix a shaky camera (the choice test) by switching to a blurry mirror (the valuation test). The authors argue the mirror is actually more distorted than the camera.

3. The "Strong" Solution (The Robust Test)

The authors propose a better way to look at the "noisy" choice data. Instead of asking, "Did they pick A more often than B?" (which is sensitive to noise), they suggest a Strong Test:

The Rule: If a person picks Option A more than 50% of the time, we assume they prefer A. If they pick it less than 50%, they prefer B.
Why it works: This is like looking at a crowd of people voting. If 60% vote for Candidate A, we know A is the winner. It doesn't matter if 10 people made a mistake and voted for B. The "majority rule" cuts through the noise.
The Result: The authors prove mathematically that this "Strong Test" is immune to the "shaky scale" problem. It works whether the noise is random, correlated, or weirdly distributed.

4. The Verdict: The Paradox is Real!

When the authors applied this "Strong Test" to the data from the previous studies (including the new data from MNOSS), the results were shocking:

MNOSS's Conclusion: "We found no evidence of the paradox."
Authors' Conclusion: "We found the paradox in 41% of the studies (and 10% in MNOSS's own data)."

The "Arbitrary Parameters" Twist:
The authors also found a sneaky trick in how the experiments were designed. The "Common Ratio Effect" only shows up if you pick very specific numbers for the money and probabilities (like $100 vs. $50). If you pick random numbers (like $12 vs. $30), the effect often disappears.

Analogy: It's like trying to find a specific type of fish. If you only cast your net in the exact spot where that fish lives, you'll find it. If you cast your net randomly in the whole ocean, you won't. MNOSS cast their net randomly and said, "No fish here!" The authors say, "You just didn't look in the right spots."

Summary for the Everyday Reader

The Conflict: Some researchers said the famous "Allais Paradox" (where people act irrationally) was just a statistical illusion caused by random mistakes. They used "Valuation" (asking for prices) to prove it.
The Defense: The authors say the Valuation method is actually more broken and biased than the old Choice method.
The Fix: They introduced a "Strong Test" (Majority Rule) that ignores the noise and looks at the clear preference.
The Outcome: When you use the Strong Test, the "irrational" behavior is still there. People really do flip-flop their choices in predictable ways. The Allais Paradox is real, and the "Valuation" method was a red herring.

In short: The authors saved the day by showing that the "new" method used to debunk the paradox was actually the one that was broken, and the "old" paradox is still very much alive and kicking.

1. Problem Statement

The paper addresses a fundamental debate in behavioral economics regarding the Common Ratio Effect (CRE), a systematic violation of Expected Utility Theory (EUT) central to the Allais Paradox.

The Conflict: Recent work by McGranaghan, Nielsen, O'Donoghue, Somerville, and Sprenger (MNOSS [2024]) argues that standard paired choice tests (comparing choice frequencies between two lottery pairs) are structurally biased when choices are stochastic (noisy). MNOSS propose paired valuation tests (eliciting certainty equivalents) as a robust alternative. Using valuation data, MNOSS find no systematic evidence for the CRE, suggesting the effect may be an artifact of noisy choice models.
The Question: Is the CRE a genuine behavioral anomaly, or is it an artifact of the testing methodology (specifically, the choice of stochastic choice model and the test statistic)? The authors aim to evaluate MNOSS's conclusion by rigorously analyzing the properties of both paired choice and valuation tests under various models of stochastic choice.

2. Methodology and Theoretical Framework

The authors employ stochastic choice theory to model how agents make decisions under uncertainty when preferences are not deterministic. They analyze four primary testing methodologies across several stochastic choice models:

A. Stochastic Choice Models

i.i.d. Additive Random Expected Utility (iAREU): The standard model where a deterministic EU utility function is perturbed by i.i.d. additive noise ( $v(\ell) + \epsilon$ ). MNOSS rely heavily on this model to argue that weak choice tests are biased.
Random Expected Utility (REU) (Gul & Pesendorfer, 2006): A model where the utility function itself is random, but the realization is always an Expected Utility function. This model satisfies a stochastic version of the Independence Axiom called Linearity.
Fechnerian Models: Generalizations where choice probability is a monotonic function of the utility difference ( $F(v(\ell) - v(\ell'))$ ).
MNOSS Reduced-Form Model: A specific reduced-form model used by MNOSS to analyze valuation tests, involving noise terms $\epsilon_{AB}$ and $\epsilon_{CD}$ with specific symmetry and scaling assumptions.

B. Testing Methodologies

Weak Paired Choice Test: Tests if $\rho(A, B) = \rho(C, D)$ $ρ (A, B) = ρ (C, D)$ .
- Critique: MNOSS argue this is biased under iAREU because noise can induce $\rho(A, B) > \rho(C, D)$ even if the underlying agent is an EU maximizer.
Strong Paired Choice Test: Tests if $\rho(A, B) \geq 1/2 \iff \rho(C, D) \geq 1/2$ $ρ (A, B) \geq 1/2 ⟺ ρ (C, D) \geq 1/2$ .
- Definition: An agent prefers $A$ to $B$ if they choose $A$ more than 50% of the time. The test checks if the direction of preference is consistent across the two lottery pairs.
Valuation Tests:
- Mean Test: Checks if $E[m_{AB}] = E[m_{CD}]$ .
- Sign Test: Checks if $Pr(m_{AB} > m_{CD}) = 1/2$ .

3. Key Contributions and Theoretical Results

A. The Bias of the Weak Paired Choice Test

The authors confirm that the Weak Paired Choice Test is indeed biased under the iAREU model. However, they argue this bias is a flaw of the iAREU model, not the test itself.

Crucial Insight: Under the Random Expected Utility (REU) model (which satisfies the stochastic Independence Axiom/Linearity), the weak paired choice test is unbiased. Since REU is a more theoretically sound extension of EUT than iAREU (which violates monotonicity and the independence axiom with probability one), the weak test remains a valid tool if one adopts REU as the null hypothesis.

B. The Robustness of the Strong Paired Choice Test

The paper's primary theoretical contribution is proving that the Strong Paired Choice Test is unbiased across a wide spectrum of stochastic choice models, including:

Random Expected Utility (REU).
Fechnerian models (including iAREU).
Models with correlated errors and perceptual noise (e.g., Random Prospect Theory).
The specific assumptions made by MNOSS (Assumption 2b).
Logic: The strong test relies on the threshold $\rho \geq 1/2$ . Under standard symmetry assumptions (errors symmetric around zero), the condition $\rho(A, B) \geq 1/2$ is equivalent to $E[u(A)] \geq E[u(B)]$ . Since the scaling factor $r$ in the common ratio effect ( $C, D$ ) does not change the sign of the utility difference, the strong test preserves the unbiasedness of the underlying preference.

C. The Failure of Valuation Tests

The authors demonstrate that valuation tests are inherently problematic and lack predictive power under standard assumptions:

"Anything Goes" Result (Proposition 1): Under the iAREU model with Assumption 2b (used by MNOSS), the authors prove that for any pair of expected valuations $(E[m_{AB}], E[m_{CD}])$ , there exists a risk aversion parameter and a noise distribution that generates them.
Mean Test Bias: The mean valuation test is biased unless agents are exactly risk-neutral. The curvature of the utility function interacts with the noise distribution to create systematic bias.
Sign Test Limitations: The sign valuation test relies on strong symmetry assumptions about the correlation of error terms (Assumption 3). The authors show these assumptions are restrictive and not necessarily satisfied in real-world data.

4. Empirical Results

The authors re-analyze existing experimental data using the Strong Paired Choice Test:

Meta-Analysis (Blavatskyy et al., 2023):
- Analyzed 143 studies.
- Result: 41.26% of studies display a Common Ratio Effect (CRE), and 7% display a Reverse Common Ratio Effect (RCRE).
- When weighted by participant numbers, over 50% of all surveyed experiments exhibit a violation of EUT under the strong test.
Re-analysis of MNOSS Data:
- Applying the strong test to MNOSS's own data reveals a 10% prevalence of CRE and 10% of RCRE.
- This contradicts MNOSS's conclusion of "no systematic evidence," showing that the effect exists but is sensitive to parameter selection and the specific test used.
Power Analysis:
- The authors simulate choices under Prospect Theory. They show that the MNOSS criterion (Panel c in their Figure 2) has extremely low statistical power. Even when agents have strong Prospect Theory preferences, the weak test (interpreted via MNOSS's logic) fails to reject EUT in almost 100% of simulations.
- In contrast, the Strong Paired Choice test correctly rejects EUT in 97.27% of simulations.

5. Significance and Implications

Restoring the Allais Paradox: The paper refutes the claim that the Common Ratio Effect has been overturned. Instead, it argues that the effect is robust and prevalent when tested with a methodologically sound approach (Strong Paired Choice).
Methodological Correction: It highlights that the choice of stochastic choice model (iAREU vs. REU) is critical. The bias in weak tests is an artifact of the iAREU model, which is theoretically inferior to REU.
Critique of Valuation Methods: The paper provides a strong theoretical warning against using paired valuation tests as a "robust" alternative. The "anything goes" result implies that valuation data can be consistent with any pattern of preferences, making it a poor tool for falsifying EUT.
General Applicability: The Strong Paired Choice Test is proposed as a general tool for detecting behavioral anomalies (e.g., present bias, common consequence effect) in stochastic choice data, as it remains unbiased across diverse models of noise and heterogeneity.

Conclusion

Echenique and Tserenjigmid conclude that the Common Ratio Effect is a robust empirical phenomenon. The apparent disappearance of the effect in recent literature is due to the use of biased weak tests (under specific noise models) and the adoption of valuation tests that lack predictive power. By adopting the Strong Paired Choice Test, researchers can reliably detect violations of Expected Utility Theory, reaffirming the significance of the Allais Paradox in behavioral economics.