Better Measurement or Larger Samples? Data Collection for Policy Learning with Unobserved Heterogeneity

Imagine you are a mayor trying to decide who should receive a special grant to help start a small business. You have a limited budget, and you want to give the money to the people who will use it best to make a profit.

This paper is about a tricky dilemma the mayor faces: Should you spend your money gathering more detailed information about a few people, or should you spend it on a larger list of people with less detailed information?

Here is the story of the paper, broken down into simple concepts.

1. The Hidden "Superpower" (Unobserved Heterogeneity)

In the real world, people are different. Some have a "hidden superpower" (like natural business talent, high motivation, or grit) that you can't see just by looking at their age or education.

The Old Way: Policymakers usually look only at what they can see (age, education, income). They guess who will succeed based on these visible traits.
The New Idea: What if we could measure that "hidden superpower"? For example, asking neighbors to rank each other's business skills. If we use this ranking to decide who gets the money, we might do a better job.

2. The Problem: The "Blurry Photo" vs. The "Big Group"

The author, Giacomo Opocher, points out two big problems with using these hidden traits:

The Blurry Photo (Measurement Error): You can't measure "business talent" perfectly. Asking neighbors for a ranking is like taking a photo with a slightly blurry lens. It helps, but it's not 100% accurate. If the photo is too blurry, you might give money to the wrong person.
The Big Group (Sample Size): You have a fixed budget. If you spend a lot of money getting very clear photos (asking 5 neighbors to rank everyone), you have less money left to actually talk to a large number of people. If you talk to fewer people, your data is shaky, and you might make a bad decision just because you didn't have enough examples.

The Trade-off: Do you buy a super-clear photo of 100 people, or a slightly blurry photo of 1,000 people?

3. The Solution: Finding the "Sweet Spot"

The author created a mathematical formula (a "regret bound") to figure out the answer. Think of this formula as a GPS for budgeting.

When to focus on clarity: If the "hidden superpower" (business skill) is huge and makes a massive difference in who succeeds, it's worth spending money to get a clearer measurement, even if you have to talk to fewer people.
When to focus on quantity: If the "hidden superpower" doesn't matter that much, or if getting a clear measurement is incredibly expensive, it's better to ignore the hidden trait and just talk to as many people as possible using standard info (like age and education).

The paper proves that there is a specific "tipping point." If the hidden trait is important enough, the blurry photo is still better than no photo at all, but you have to balance how many photos you take.

4. The Real-World Test: The Indian Market Experiment

To prove this works, the author looked at a real experiment in rural India where micro-entrepreneurs were given cash grants.

The Setup: Researchers asked entrepreneurs to rank their peers on business skills. This ranking was the "proxy" for the hidden talent.
The Findings:
- Using the rankings (the hidden trait) increased the total wealth generated by 5%.
- It cut the chance of making a "bad decision" (giving money to someone who fails) in half.
- The Budget Twist: The researchers simulated different budgets. They found that if the budget is tight, you shouldn't try to get the perfect ranking (asking 5 neighbors). Instead, you should ask fewer neighbors (maybe 2) and use the saved money to include more entrepreneurs in the study.
- The Result: Even with a small budget, it was always better to use some ranking information than to ignore it completely. But the "perfect" amount of information changes depending on how much money you have.

The Big Takeaway

This paper tells policymakers: Don't just guess, and don't just collect data blindly.

If you want to help people effectively, you need to measure the things that really matter (like motivation or skill), even if your measurement isn't perfect. However, you must be smart about your budget. Sometimes, a "good enough" measurement of many people is better than a "perfect" measurement of a few. The author gives you the math to find that perfect balance so you can maximize the good you do with every dollar spent.

1. Problem Statement

The paper addresses a fundamental trade-off in policy learning (the design of individualized treatment rules) when unobserved heterogeneity exists.

Context: Policymakers often wish to target interventions (e.g., cash transfers, job training) based on latent traits like innate ability, motivation, or business skills, which are not directly observable.
The Dilemma: To utilize these latent traits, policymakers must rely on proxies (e.g., survey scores, peer rankings, satellite data) which contain measurement error.
The Trade-off: Under a fixed budget, resources can be allocated to:
1. Improving Proxy Precision: Reducing measurement error (e.g., by collecting more repeated measurements or higher-quality data).
2. Increasing Sample Size: Collecting more observations to learn the optimal policy rule.
The Core Question: When is it optimal to invest in better measurement of the latent trait versus collecting a larger sample, and how does the measurement error propagate into the welfare loss of the resulting policy?

2. Methodology and Theoretical Framework

A. Formal Setting

Data: The author considers a random sample of units with observed covariates $X_i$ , a binary treatment $D_i$ , an outcome $Y_i$ , and a latent trait $A_i$ .
Proxy: The latent trait $A_i$ is unobserved but a noisy proxy $\hat{A}_i = A_i + \epsilon_i$ is available.
Policy Classes:
- Covariate-Based (CB): Rules $G(X)$ using only observed covariates.
- $\hat{a}$ -Augmented (a-CB): Rules $G(X, \hat{A})$ using the noisy proxy.
Regret Definition: The paper introduces a novel definition of regret. Instead of comparing the estimated rule to the best rule within the same class (standard in literature), it compares the estimated rule to an Oracle that observes the true latent factor $A_i$ $A_{i}$ and knows the true causal structure.
- $R(\hat{G}) = E_P [W(G^*_{FB}(X, A)) - W(\hat{G}(Z))]$
- This allows for a fair comparison between CB and a-CB rules against a common, ideal benchmark.

B. Key Assumptions

Bounded Outcomes: Potential outcomes are uniformly bounded.
Stratified Random Assignment: Treatment is independent of potential outcomes conditional on covariates.
Measurement Error Structure: The proxy error is additive and independent of the true latent trait conditional on covariates ( $\epsilon_i \perp A_i | X_i$ ).
Policy Class Complexity: Measured by VC-dimension ( $v$ ). The class must be flexible enough to approximate the true Conditional Average Treatment Effect (CATE).
Margin Condition: The probability of the score function being close to zero is bounded, preventing degenerate distributions.

C. Theoretical Results: Regret Bounds

The author derives rate-sharp minimax regret bounds for both policy classes:

For Covariate-Based (CB) Rules:
The regret is bounded by the sum of:
- Statistical Error: Scales as $O(\sqrt{v/n})$ .
- Approximation Error: Proportional to the residual variation in treatment effects unexplained by $X$ (denoted $\bar{\sigma}_{\tau|x}$ ). This term represents the welfare loss from ignoring the latent trait.
For $\hat{a}$ -Augmented (a-CB) Rules:
The regret is bounded by:
- Statistical Error: Scales as $O(\sqrt{v_{aug}/n})$ , where $v_{aug}$ is the complexity of the augmented class (typically higher than $v$ ).
- Estimation Error: Proportional to the Root Mean Squared Error (rMSE) of the proxy, denoted $\rho$ .
- Key Insight: Even with infinite data ( $n \to \infty$ ), if the proxy is noisy ( $\rho > 0$ ), there is a non-vanishing welfare loss.

D. The Data Collection Optimization Problem

The paper frames the design of data collection as a minimax optimization problem under a budget constraint $B_0$ :
$\min_{t, n} \left( \text{Regret}(t, n) \right) \quad \text{s.t.} \quad c_t(t) + c_n(n) \leq B_0$
Where:

$t$ : Information level (determines proxy precision/rMSE).
$n$ : Sample size.
$c_t, c_n$ : Cost functions for measurement and sampling.

Proposition 1 (Optimal Design):
The solution exhibits a corner-versus-interior structure:

Corner Solution ( $t^*=0$ ): If the latent heterogeneity is low or the cost of improving the proxy is too high, it is optimal to ignore the proxy entirely and allocate the entire budget to increasing the sample size ( $n^* = B_0/c_n$ ).
Interior Solution ( $t^*>0$ ): If the latent heterogeneity is significant and returns to precision are high, the budget is split. The optimal ratio of sample size to information investment ( $q = n/t$ $q = n / t$ ) depends on:
- The complexity of the augmented policy space.
- The scale of the proxy's noise ( $m_0$ ).
- The relative costs of measurement vs. sampling.

3. Empirical Application

The author applies the framework to data from Hussam et al. (2022), a randomized controlled trial (RCT) in rural India targeting micro-entrepreneurs with cash grants.

Proxy: "Community Rankings" (peers ranking each other's business skills).
Unique Feature: The proxy was constructed from the average of 5 independent rankers, allowing the author to simulate varying levels of precision ( $t \in \{1, ..., 5\}$ ).

Key Empirical Findings:

Welfare Gains: Incorporating community rankings into targeting rules increases average welfare by 5% compared to random assignment and 4% compared to rules using only standard covariates (age, education). It also halves the probability of generating welfare losses (harm rate).
Precision Decay: The author confirms the theoretical prediction that welfare gains from augmented rules decrease as the number of rankers (precision) decreases.
Optimal Budget Allocation:
- Never Ignore Heterogeneity: Even with tight budgets, it is never optimal to set $t=0$ (ignore the proxy).
- Budget-Dependent Strategy:
  - Low Budget: Optimal to use fewer rankers (e.g., 2) to preserve a larger sample size.
  - High Budget: Optimal to increase the number of rankers (up to 4 or 5) as the constraint on sample size relaxes.
- Saturation: Beyond a certain budget, gains plateau as the sample size hits the data cap.

4. Key Contributions

Theoretical Innovation:
- Introduces a common regret benchmark (the Oracle with true latent traits) allowing for meaningful comparison between policy classes with and without latent variables.
- Derives rate-sharp regret bounds that explicitly account for the propagation of measurement error into policy welfare.
Data Collection Design:
- Formalizes the trade-off between measurement precision and sample size in policy learning.
- Provides minimax optimal allocation rules (Proposition 1) and practical sample-splitting procedures (Algorithms 1 & 2) for researchers to estimate these trade-offs empirically without assuming a minimax perspective.
Practical Guidance:
- Demonstrates that ignoring unobserved heterogeneity is suboptimal even under tight constraints, but the method of measuring it (precision vs. sample size) must be calibrated to the budget.

5. Significance

This paper bridges the gap between causal inference (handling unobserved confounding/heterogeneity) and experimental design (resource allocation). It moves beyond the standard question of "how to estimate treatment effects" to "how to design the data collection process itself" to maximize policy welfare. The results suggest that in many development economics contexts, investing in better proxies for latent traits (like business skills) yields higher returns than simply scaling up sample sizes, provided the budget allows for a balanced allocation.