Sampling Logit Equilibrium and Endogenous Payoff Distortion

Imagine you are trying to decide what to eat for dinner in a city of 10,000 people. You want to pick the most popular restaurant, but you can't ask everyone. Instead, you ask just five of your friends what they ate today.

This simple scenario captures the core of Minoru Osawa's paper, which introduces a new way to understand how people make decisions in groups when they don't have perfect information and aren't perfectly logical.

Here is the breakdown of the paper using everyday analogies:

1. The Two Problems: "Bad Data" and "Bad Brains"

In traditional economics, we often assume people are like supercomputers: they know everything about what everyone else is doing and always pick the absolute best option. But in real life, two things go wrong:

The "Bad Data" Problem (Finite Sampling): You don't know what the whole city is doing. You only know what your 5 friends ate. If your friends happened to all go to a trendy but terrible new sushi place, you might think sushi is the best choice, even if it's not. This is sampling noise.
The "Bad Brain" Problem (Stochastic Choice): Even if you knew the true best option, you might still make a mistake. Maybe you're tired, distracted, or just feel like trying something new. You might pick the second-best option by accident. This is random noise.

Most previous theories studied these problems separately. Osawa's paper asks: What happens when you have both bad data and a noisy brain?

2. The New Concept: The "Sampling Logit Equilibrium" (SLE)

The author creates a model called the Sampling Logit Equilibrium (SLE). Think of it as a simulation of a crowd where:

Everyone grabs a handful of random people to ask for advice (the sample).
Everyone calculates which option looks best based only on that small handful.
Everyone then flips a weighted coin to make their final choice (the logit rule). The coin is weighted so the "best" option is most likely, but it's not guaranteed.

The paper finds that when you mix these two things together, something surprising happens: The crowd doesn't just make random mistakes; they start systematically preferring the wrong things.

3. The "Virtual Game": A Hall of Mirrors

The most brilliant part of the paper is the discovery that we can describe this messy behavior as if the players were playing a different game entirely.

Imagine the players are walking through a funhouse with distorted mirrors. They think they are looking at the real world, but the mirrors (the sampling noise) are stretching and shrinking the reality.

The Variance Premium (The "Excitement" Bias): If an option has a "bumpy" payoff (sometimes great, sometimes terrible), the sampling noise makes it look more attractive than it really is. Why? Because when you take a small sample, you are more likely to catch the "great" moments by luck. It's like a gambler who thinks a slot machine is a "hot streak" just because they got lucky on the first three pulls. The crowd overvalues risky, volatile options.
The Curvature Premium (The "Shape" Bias): If the payoff curve is curved (like a hill), the noise makes the top of the hill look higher than it is. This is a mathematical quirk called "Jensen's Inequality." Essentially, the crowd behaves as if the world is more exciting and rewarding than it actually is.

The Takeaway: The players aren't just confused; they are playing a "Virtual Game" where the rewards have been secretly altered by the noise of their own limited observations.

4. Why This Matters: Choosing the Winner

In many games (like choosing between two technologies, or two political parties), there are multiple possible stable outcomes. Traditional models often struggle to predict which one the crowd will actually pick.

Osawa's paper shows that finite sampling acts as a tie-breaker.

If you only ask one person (a tiny sample), the crowd is very likely to converge on the "Risk-Dominant" option (the safe, boring choice that works even if you are wrong).
As you ask more people (larger sample), the crowd starts to behave more like the "perfectly informed" models, and the tie-breaker effect disappears.

The Metaphor: Imagine a group of people trying to find the exit in a dark maze.

Perfect Rationality: They all see the whole map and walk straight to the exit.
Pure Randomness: They wander aimlessly.
Sampling Logit (This Paper): They only look at the floor right in front of them (sampling) and stumble a bit (noise). Surprisingly, this combination makes them more likely to find the "safe" exit quickly, rather than getting stuck in a loop of trying to find the "perfect" exit.

Summary

This paper tells us that limited information doesn't just add "fuzziness" to decision-making; it changes the rules of the game.

When people rely on small samples of information, they systematically overvalue options that are volatile or have curved payoff structures. They end up playing a "Virtual Game" with distorted rewards. This helps explain why crowds sometimes make predictable, systematic errors and how small groups might select specific outcomes (like a specific technology or social norm) that larger, better-informed groups would ignore.

In short: When you only look at a few friends for advice, you don't just get a bad opinion; you get a different reality.

Here is a detailed technical summary of the paper "Sampling Logit Equilibrium and Endogenous Payoff Distortion" by Minoru Osawa.

1. Problem Statement

The paper addresses a gap in game theory regarding how agents make decisions under two simultaneous constraints:

Informational Frictions: Agents do not observe the true population state perfectly but rely on a finite sample of opponents' actions.
Stochastic Choice: Agents do not respond deterministically to their perceived payoffs but follow a probabilistic choice rule (specifically, the Logit rule) due to cognitive noise or idiosyncratic shocks.

Existing literature typically treats these separately:

Quantal Response Equilibrium (QRE) assumes full information but stochastic choice.
Sampling Equilibrium assumes limited information but deterministic best responses.

Research Question: How does the interaction between finite sampling and stochastic choice alter equilibrium behavior? Specifically, does finite sampling merely add noise, or does it systematically distort incentives?

2. Methodology

A. Model Framework

Setting: Large-population population games with a unit mass of anonymous agents choosing from a finite set of actions $S$ .
The $(k, \eta)$ -Sampling Logit Choice Rule:
- An agent draws $k$ independent samples of opponents from the population.
- The agent calculates the empirical population state $w$ from the sample.
- The agent evaluates payoffs based on $w$ and selects an action according to a Logit rule with noise level $\eta$ .
- The aggregate choice rule $L_{k,\eta}(x)$ is the expectation of these logit choices over all possible samples.
Sampling Logit Equilibrium (SLE): Defined as a fixed point $x = L_{k,\eta}(x)$ .

B. Analytical Approach

Exact Analysis: The author derives exact properties for specific benchmark cases (e.g., $k=1, 2$ in coordination games) to establish existence, uniqueness, and global stability.
Approximation via Delta Method: For large sample sizes ( $k$ $k$ ), the author employs the Delta Method (a statistical technique for approximating the moments of a function of a random variable) to approximate the expected choice rule.
- The empirical state $w$ is approximated as a multivariate normal distribution with mean $x$ and covariance $\Sigma/k$ .
- A second-order Taylor expansion is applied to the logit choice function to derive a deterministic approximation of the stochastic process.
Virtual Payoff Representation: The approximated choice rule is mapped to a "virtual game" where agents face modified payoffs. This allows the use of standard Logit Equilibrium tools to analyze the SLE.

3. Key Contributions

A. Theoretical Synthesis

The paper unifies Sampling Equilibrium (Osborne & Rubinstein, 2003; Oyama et al., 2015) and Logit Equilibrium (McKelvey & Palfrey, 1995). It demonstrates that SLE spans a continuum between fully informed stochastic choice (large $k$ ) and limited-information deterministic response (small $\eta$ ).

B. Endogenous Payoff Distortion

The core theoretical contribution is the derivation of Virtual Payoffs. The paper proves that for large $k$ , the SLE of a game $F$ is well-approximated by the Logit Equilibrium of a virtual game $\tilde{F} = F + G$ , where $G$ represents systematic distortions generated by sampling noise. These distortions are not random but depend on the structure of the game.

C. Decomposition of Distortions

The distortion term $G$ is decomposed into two distinct premiums:

Variance Premium ( $v$ ): Arises from the convexity of the exponential function in the Logit rule. Agents overweight actions where the variance of the estimated payoff is high.
Curvature Premium ( $q$ ): Arises from the curvature (second derivative) of the payoff function. Due to Jensen's inequality, sampling noise interacts with payoff curvature to shift effective payoffs.

4. Key Results

A. Existence and Stability

General Case: SLE exists for all $k \geq 1$ and $\eta > 0$ .
Uniqueness:
- For $k=1$ , the SLE is unique and globally asymptotically stable.
- For $k=2$ in two-action games, the SLE is unique and globally asymptotically stable.
Equilibrium Selection: As $\eta \to 0$ (noise vanishes), the SLE converges to the risk-dominant Nash equilibrium in coordination games. This suggests that finite sampling can sharpen equilibrium selection compared to standard Logit Equilibrium (which may have multiple equilibria for small $\eta$ ).

B. The Variance Premium (Section 5)

In linear games, the variance premium favors actions with higher variance in relative marginal payoffs.
Mechanism: Due to the convexity of the Logit function ( $e^x$ ), positive payoff errors increase choice probability more than negative errors decrease it ("lucky draw" effect).
Implication: Agents behave as if they prefer suboptimal actions if those actions have high payoff variability. In 2x2 games, this bias is strongest when the population state is balanced (high uncertainty) and vanishes at pure states.

C. The Curvature Premium (Section 6)

In separable games, the curvature premium favors actions with convex payoff functions.
Mechanism: Sampling noise causes the expected value of a convex function to be higher than the function evaluated at the mean (Jensen's Inequality).
Implication: Agents prefer actions with locally convex payoffs even if their expected payoffs are identical to concave alternatives.

D. Approximation Accuracy

The paper provides a rigorous error bound (Theorem 1). The approximation of the true SLE by the virtual game is accurate when $k$ is large relative to $\eta^{-2}$ . Specifically, the error is of order $O(1/k)$ , but the approximation deteriorates as $\eta \to 0$ unless $k$ grows sufficiently fast.

5. Significance and Implications

Behavioral Realism: The model offers a more realistic description of bounded rationality by combining limited information processing with stochastic decision-making, fitting experimental data better than models treating these factors in isolation.
Equilibrium Selection: The framework provides a mechanism for equilibrium selection in coordination games. Finite sampling acts as a selection device that drives the population toward the risk-dominant equilibrium, even when agents are not fully rational.
Policy and Design: The identification of "variance" and "curvature" premiums suggests that in environments with limited information, agents may systematically overvalue volatile or convex strategies. This has implications for market design, mechanism design, and understanding financial bubbles or crashes where agents rely on small samples.
Analytical Tractability: By reducing the complex stochastic sampling process to a deterministic "virtual game" with modified payoffs, the paper allows researchers to apply standard evolutionary game theory tools to analyze systems with informational frictions.

Conclusion

Osawa's paper establishes that finite sampling does not merely add noise to decision-making; it systematically distorts incentives. By introducing the Sampling Logit Equilibrium and deriving its virtual payoff representation, the author demonstrates how the interplay between sample size ( $k$ ) and choice noise ( $\eta$ ) creates endogenous biases (variance and curvature premiums) that fundamentally alter equilibrium outcomes and selection dynamics.