Conditional Rank-Rank Regression via Deep Conditional Transformation Models

Here is an explanation of the paper, translated into everyday language with some creative analogies.

The Big Picture: Measuring the "Ladder of Success"

Imagine society is a giant ladder. Intergenerational mobility is the study of how likely a child is to climb to a different rung on that ladder compared to where their parents started.

High Mobility: A child born at the bottom can easily reach the top. The ladder is slippery; you don't stay stuck where you started.
Low Mobility (High Persistence): A child born at the bottom stays at the bottom, and a child born at the top stays at the top. The ladder is sticky; your starting position determines your ending position.

For decades, economists have used a standard tool called Rank-Rank Regression (RRR) to measure this stickiness. It's like lining up all the parents by income and all the children by income, then seeing how well the two lines match up.

The Problem: The "One-Size-Fits-All" Trap

The old method has a flaw. It treats everyone as if they are in the same race. But in real life, people run different races.

The Flaw: If you compare a child from a wealthy, educated family to a child from a poor, rural family, the old method might say, "Look how much the rich kid stayed rich!" But that's unfair. The rich kid had a head start (better schools, connections).
The Solution (CRRR): We need to compare apples to apples. We need to see how much a child moves within their own group (e.g., within the group of "rural families" or "families with college degrees"). This is called Conditional Rank-Rank Regression (CRRR).

To do this, we have to calculate a "conditional rank." Instead of asking, "Where does this child stand in the whole country?" we ask, "Where does this child stand among people with similar parents and backgrounds?"

The Old Way vs. The New Way

To calculate these "within-group" ranks, you need to understand the distribution of outcomes for every specific group.

The Old Method (Distribution Regression - DR):
Imagine you are trying to map the terrain of a forest. The old method tries to map it by taking a photo of every single tree individually and then trying to stitch the photos together.

The Issue: If the forest is complex (lots of hills, weird shapes, dense bushes), stitching thousands of photos together is messy. The edges might not match, the picture might get blurry, and if the forest has weird features (like heavy rain or unique soil), the old method might get the shape completely wrong. It's rigid and prone to errors when the data is complicated.

The New Method (Deep Conditional Transformation Models - DCTM):
The authors propose a new tool: DCTM.

The Analogy: Instead of taking photos of individual trees, imagine you have a smart, 3D-printing robot that can scan the whole forest at once. It learns the shape of the terrain directly. It doesn't just guess; it builds a flexible, continuous model that bends and twists to fit the actual shape of the data, whether it's a smooth hill or a jagged cliff.
Why it's better:
1. Flexibility: It handles complex, non-linear relationships (like how education and income interact in weird ways) that the old method misses.
2. No "Glitches": The old method sometimes produced impossible results (like a probability map that went backwards). The new robot is built with "guardrails" that ensure the map always makes mathematical sense.
3. Handling Ties: In real life, many people have the exact same income or education level (ties). The old method struggled with this. The new method has a special "tie-handling" knob (called $\omega$ ) that lets researchers test how different ways of breaking ties change the results, making the findings more honest.

The "Cross-Fitting" Trick

To make sure their smart robot isn't just memorizing the answers (a problem called "overfitting"), the authors use a technique called Cross-Fitting.

The Analogy: Imagine you are training a student for a math test. If you let them study the exact test questions they will take, they will memorize the answers and get a perfect score, but they won't actually understand math.
The Fix: You split the class into groups. Group A studies, then Group B takes the test. Then Group B studies, and Group A takes the test. You repeat this until everyone has been tested on material they didn't study.
The Result: This ensures the model is actually learning the patterns of the data, not just memorizing specific people's outcomes. This makes the results much more reliable.

What Did They Find? (The Real-World Tests)

The authors tested their new method on two big datasets:

US Income (PSID Data):
- They looked at how much a father's income predicts a child's income.
- The Finding: When they adjusted for background factors (like education and family size), the "stickiness" of income dropped. This means some of the persistence we see is just because rich families stay rich as a group, not because every rich kid is destined to be rich.
- Gender Gap: They found that a father's income predicts a daughter's future income much more strongly than a son's. Sons seem to have more room to move up or down the ladder on their own, while daughters' economic paths are more tightly bound to their family's starting point.
Indian Education (IHDS Data):
- They looked at how a father's education level predicts a child's education level. Education is a "discrete" variable (you have a high school diploma, or you don't; you can't have 1.5 diplomas).
- The Finding: The new method showed that how you handle "ties" (people with the same education level) changes the conclusion.
- Gender Gap: In India, they found massive gender differences. For sons, education mobility is lower in Muslim households and urban areas. For daughters, the pattern is different. The new method revealed these subtle, complex patterns that the old, rigid method would have smoothed over or missed entirely.

The Takeaway

This paper is like upgrading from a paper map to a GPS with real-time traffic.

The old way (paper map) was okay for simple, straight roads.
The new way (DCTM + Cross-Fitting) is essential for navigating the complex, winding, and bumpy roads of modern economic data. It gives us a clearer, more accurate picture of how much opportunity really exists for children to change their fate, and it highlights that the "rules of the game" are often different for sons and daughters, and for different social groups.

Here is a detailed technical summary of the paper "Conditional Rank-Rank Regression via Deep Conditional Transformation Models" by Wang, Feng, and Wang.

1. Problem Statement

Intergenerational mobility measures the transmission of socio-economic status (e.g., income, education) from parents to children. The standard empirical tool is Rank-Rank Regression (RRR), which regresses the child's rank on the parent's rank. The slope coefficient ( $\rho$ ) represents intergenerational persistence (where higher $\rho$ implies lower mobility).

However, standard RRR has limitations:

Lack of Covariate Control: It measures aggregate mobility without accounting for observed covariates ( $X$ , e.g., race, region, education).
Interpretability of RRRX: Simply adding covariates to RRR (RRRX) yields coefficients that are difficult to interpret; they often fall outside the natural $[-1, 1]$ range and no longer correspond to rank correlations.
Conditional Rank-Rank Regression (CRRR) Challenges: To solve this, Chernozhukov et al. (2024) proposed CRRR, which uses conditional ranks (ranks computed within covariate-defined groups) instead of marginal ranks.
- Implementation Bottleneck: Existing CRRR relies on Distribution Regression (DR), which estimates conditional distributions by fitting many separate binary regressions (logit/probit) across a grid of thresholds. This approach is computationally expensive, prone to misspecification under nonlinearity/high-order interactions, and struggles to enforce global monotonicity (a requirement for valid CDFs).
- Discrete Outcomes: Most existing theory assumes continuous outcomes. Many real-world variables (education levels, occupational classes) are discrete and ordered, creating ties that standard CRRR does not address.

2. Methodology

The authors propose a new framework replacing Distribution Regression with Deep Conditional Transformation Models (DCTM) combined with cross-fitting.

A. Deep Conditional Transformation Models (DCTM)

Instead of fitting pointwise binary regressions, DCTM learns the conditional Cumulative Distribution Function (CDF), $F_{Y|X}(y|x)$ , end-to-end using a neural network.

Mechanism: It assumes a transformation function $h(y; x)$ $h (y; x)$ maps the outcome $Y$ $Y$ to a latent variable $Z$ $Z$ with a known baseline distribution (e.g., Standard Normal for continuous, Logistic for discrete).
- $F_{Y|X}(y|x) = F_0(h(y; x))$ .
Architecture:
- Continuous Outcomes: Uses Bernstein basis functions to parameterize the transformation. The network outputs coefficients constrained to be non-decreasing, ensuring the resulting CDF is valid (monotone) by construction.
- Discrete Outcomes (dDCTM): Uses a cumulative construction where the network outputs increments for category probabilities, ensuring monotonicity across ordered categories without post-hoc rearrangement.
Advantages: Handles high-dimensional $X$ , strong nonlinearity, and interactions automatically; enforces probability axioms (monotonicity) structurally; avoids the "curse of dimensionality" associated with threshold grids in DR.

B. Cross-Fitting Strategy

To mitigate overfitting bias inherent in using machine learning for nuisance parameter estimation (the conditional CDFs), the authors employ cross-fitting:

Split data into $K$ folds.
Train DCTM on $K-1$ folds to predict conditional ranks for the held-out fold.
Aggregate out-of-fold (OOF) ranks to compute the final CRRR slope estimator.
This ensures the ranks used in the regression are independent of the training data for those specific observations.

C. Discrete Outcome Extension ( $\omega$ -indexed Ranks)

For discrete outcomes, ranks are not unique due to ties. The authors introduce a parameterized rank definition:
$R_{Y|X=x}(y) = \omega F_{Y|X}(y|x) + (1-\omega) F^{-}_{Y|X}(y|x)$
where $\omega \in [0, 1]$ determines how ties are handled ( $\omega=0.5$ is the mid-rank, $\omega=0$ is the smallest rank, $\omega=1$ is the largest). They demonstrate that the CRRR slope is highly sensitive to $\omega$ in discrete settings, necessitating explicit reporting of the tie-handling rule.

D. Inference

The paper establishes asymptotic normality for the estimators in the continuous case under a fixed-complexity regime. For inference, they utilize an exchangeable bootstrap (reweighting the likelihood during training) to compute standard errors and confidence intervals, avoiding the complexity of deriving analytical variance formulas for the DCTM nuisance parameters.

3. Key Contributions

Methodological Innovation: Replaces the rigid, threshold-based Distribution Regression with flexible, end-to-end DCTM for conditional rank estimation. This allows CRRR to handle complex data structures (nonlinearity, interactions, heteroskedasticity) that traditional methods miss.
Discrete Extension: Provides the first systematic treatment of CRRR for discrete ordered outcomes. They define a parametric family of conditional ranks and quantify the sensitivity of mobility estimates to tie-handling rules ( $\omega$ ).
Theoretical Guarantees: Proves consistency and asymptotic normality of the proposed estimators under fixed model complexity and validates the use of exchangeable bootstrap for inference.
Empirical Application: Demonstrates the method's utility on two major datasets:
- PSID (USA): Analyzing income mobility.
- IHDS (India): Analyzing educational mobility.

4. Results

Simulation Studies

Simple Continuous Settings: DCTM performs comparably to DR, confirming validity in standard cases.
Complex Continuous Settings: In scenarios with high-order interactions and nonlinearity, DR fails significantly (severe bias, RMSE $\approx$ 0.43), while DCTM remains accurate (RMSE $\approx$ 0.005).
Discrete Settings: DCTM (dDCTM) outperforms DR in fitting conditional CDFs for discrete outcomes. The simulations confirm that the estimated slope $\hat{\rho}_C$ varies substantially with the choice of $\omega$ , validating the need for sensitivity analysis.

Empirical Findings

US Income Mobility (PSID):
- Decomposition: The total persistence ( $\rho_{RRR} \approx 0.18$ ) is decomposed into within-group persistence ( $\rho_{CRRR} \approx 0.12$ ) and between-group persistence.
- Gender Gap: Daughters exhibit significantly higher intergenerational persistence ( $\rho \approx 0.18$ ) than sons ( $\rho \approx 0.06$ ) even after controlling for covariates, suggesting daughters' economic outcomes are more tightly bound to family background.
Indian Educational Mobility (IHDS):
- Discrete Sensitivity: The study highlights that conclusions about gender differences in mobility reverse depending on the tie-handling parameter $\omega$ .
- Heterogeneity: Mobility patterns vary significantly by caste (Muslim households show different persistence patterns) and urbanization.
- Persistence: Strong top-end persistence is observed; children of highly educated fathers are significantly more likely to remain in the highest education tier.

5. Significance

This paper advances the field of intergenerational mobility research by:

Robustness: Providing a method that is robust to the complex, non-linear realities of socio-economic data where traditional parametric or semi-parametric methods fail.
Interpretability: Restoring the clear economic interpretation of the rank-rank slope (as an average within-group correlation) even when controlling for high-dimensional covariates.
Practicality: Offering a unified workflow for both continuous and discrete outcomes, addressing a critical gap in the literature regarding ordinal variables like education and occupation.
Policy Relevance: By decomposing mobility into within- and between-group components, the method helps policymakers distinguish between inequality caused by individual background (within-group) versus structural group differences (between-group).

The authors conclude that while Distribution Regression works in simple settings, DCTM-based CRRR is superior for complex, real-world data, offering more stable and accurate estimates of social mobility. Future work is suggested to extend the asymptotic theory to non-parametric regimes using double/debiased machine learning techniques.