Fairness May Backfire: When Leveling-Down Occurs in Fair Machine Learning

Here is an explanation of the paper "Fairness May Backfire: When Leveling-Down Occurs in Fair Machine Learning," broken down into simple concepts with everyday analogies.

The Big Picture: The "Leveling Down" Problem

Imagine a school principal trying to fix a problem where one group of students (let's call them Group A) is getting into the "Advanced Class" much more often than another group (Group B). The goal of fairness is to make sure Group B gets a fair shot.

Usually, we think fixing this means Leveling Up: helping Group B get more spots without hurting Group A.

However, this paper argues that sometimes, trying to be fair actually leads to Leveling Down. This happens when the principal tries to fix the imbalance by making everyone worse off, or by helping Group B in a way that accidentally hurts them in the long run (like letting unqualified students into the class, causing them to fail).

The authors ask: When does fairness help, and when does it backfire?

To answer this, they look at two different ways a decision-maker (like a bank, a hiring manager, or a school) can make decisions.

Scenario 1: The "Open Book" Exam (Attribute-Aware)

The Setup: The decision-maker can see everyone's ID card. They know exactly who is in Group A and who is in Group B.

The Analogy: Imagine a teacher grading two different classes. The teacher knows exactly which student belongs to which class.

What Happens When They Try to Be Fair:
If the teacher sees that Class A is getting too many "A" grades, they can simply lower the bar for Class B and raise the bar for Class A.

For the Disadvantaged Group (Class B): The teacher lowers the threshold. More students get in. Their "success rate" goes up.
For the Advantaged Group (Class A): The teacher raises the threshold. Fewer students get in. Their "success rate" goes down.

The Result:

Good News: The disadvantaged group always gets better outcomes (more people get in).
Bad News: The advantaged group always gets worse outcomes (fewer people get in).
The Catch: While more people from Group B get in, the quality of those who get in might drop slightly because the bar was lowered. But overall, the system is "fair" in a predictable way: it shifts resources from the rich to the poor.

Verdict: In this scenario, fairness is like a see-saw. If one side goes up, the other goes down. It's predictable and rarely hurts the disadvantaged group.

Scenario 2: The "Blind Audition" (Attribute-Blind)

The Setup: The decision-maker is not allowed to see the ID cards. They have to make decisions based only on the resume or the test score. This is common in real life due to laws against discrimination (like not being allowed to ask about race or gender).

The Analogy: Imagine an orchestra holding a "blind audition." Musicians play behind a curtain. The judges can hear the music (the skills), but they cannot see who is playing.

The Problem:
Even though the judges can't see the group labels, the music might still sound different for the two groups because of how the groups are distributed.

Maybe Group A tends to play slightly more complex pieces.
Maybe Group B tends to play slightly simpler pieces.

What Happens When They Try to Be Fair:
The judges try to fix the imbalance. But because they can't see the groups, they have to adjust the rules for everyone based on the "vibe" of the music.

They might decide: "The music coming from the left side of the stage (which happens to have more Group A players) is too easy, so we'll raise the bar for everyone playing from the left."
Or: "The music from the right side is too hard, so we'll lower the bar for everyone playing from the right."

The Result: The "Leveling Down" Trap
Because the judges are blind, they can't target the groups directly. They end up targeting features (like the type of music or the instrument).

The "Masked" Candidates: This is the key concept. Some people from the "Disadvantaged" group might look like they belong to the "Advantaged" group based on their resume (e.g., they went to a fancy school). These are Masked Candidates.
The Backfire:
- If the judges try to help the disadvantaged group by lowering the bar for "simple music," they might accidentally let in unqualified people from the Advantaged group who happen to play simple music.
- Conversely, if they raise the bar to stop the Advantaged group, they might accidentally kick out qualified people from the Disadvantaged group who happen to play complex music.

The Outcome:
In this "Blind" scenario, fairness can go three ways:

Leveling Up: Both groups get better outcomes (rare).
Leveling Down (The Danger Zone): Both groups get worse outcomes. For example, the judges might raise the bar so high to stop the "Advantaged" group that they accidentally exclude so many qualified people from the "Disadvantaged" group that the overall success rate for everyone drops.
Mixed Bag: One group gets better, the other gets worse, but not in the predictable way we saw in the "Open Book" scenario.

The Core Takeaway: Why "Blind" Fairness is Tricky

The paper uses a metaphor of Masked Candidates to explain why this happens.

In the "Open Book" (Attribute-Aware) world: You know who is who. If you want to help Group B, you help Group B. You don't accidentally hurt Group A's qualified members, and you don't accidentally help Group A's unqualified members.
In the "Blind" (Attribute-Blind) world: You are guessing. You see a resume that looks "Advantaged-like," so you treat it as such. But that person might actually be from the "Disadvantaged" group.
- If you try to be fair by punishing "Advantaged-like" resumes, you might punish a Disadvantaged person who looks like an Advantaged person.
- If you try to be fair by helping "Disadvantaged-like" resumes, you might help an Advantaged person who looks like a Disadvantaged person.

Summary for Decision Makers

If you can see the sensitive data (legally allowed): Enforcing fairness is safe. It will help the disadvantaged group and slightly hurt the advantaged group, but it won't accidentally make the disadvantaged group worse off.
If you cannot see the sensitive data (blind): Enforcing fairness is risky. It depends entirely on the data distribution.
- It might help everyone.
- It might hurt everyone (Leveling Down).
- It might help the wrong people because of "Masked Candidates."

The Lesson: Just because you are trying to be fair doesn't mean the outcome will be fair. In "blind" systems, the path to fairness is a minefield where good intentions can accidentally lead to bad results for the very people you are trying to help. You need to understand the specific data landscape before you try to "fix" the system.

Here is a detailed technical summary of the paper "Fairness May Backfire: When Leveling-Down Occurs in Fair Machine Learning" by Yi Yang, Xiangyu Chang, and Pei-yu Chen.

1. Problem Statement

The paper addresses a critical gap in the literature on algorithmic fairness: while fairness constraints are designed to improve outcomes for disadvantaged groups, empirical evidence suggests they often lead to "leveling down." This phenomenon occurs when enforcing fairness makes one or both groups worse off (e.g., reducing overall accuracy or selection rates for everyone) rather than achieving a "leveling up" (improving the disadvantaged group without harming the advantaged group).

The authors investigate the fundamental question: Under what conditions does enforcing fairness genuinely improve group outcomes, and when does it systematically cause harm? They specifically analyze how the availability of sensitive attributes (e.g., race, gender) at the time of decision-making influences these outcomes.

2. Methodology

The authors employ a unified, population-level (Bayes-optimal) framework for binary classification. This approach is distinct from previous works that rely on finite-sample noise or specific algorithmic implementations.

Bayes-Optimal Lens: They analyze the theoretical limit of what is achievable (the Bayes-optimal classifier) rather than specific training algorithms. This isolates the intrinsic effects of fairness constraints from implementation artifacts.
Distribution-Free & Algorithm-Agnostic: The results hold for arbitrary data-generating processes without assuming specific parametric forms for the data distribution.
Two Deployment Regimes: The study contrasts two distinct settings reflecting real-world legal and operational constraints:
1. Attribute-Aware: Sensitive attributes ( $S$ ) are available and can be used in prediction (e.g., clinical settings where genetic risk varies by gender).
2. Attribute-Blind: Sensitive attributes are excluded from prediction due to legal restrictions (e.g., anti-discrimination laws in lending) or data unavailability. Predictions rely solely on non-sensitive features ( $X$ ).
Fairness Notions: The analysis covers three prevalent group fairness metrics:
- Demographic Parity (DP)
- Equal Opportunity (EO)
- Predictive Equality (PE)
Metrics: The study evaluates outcomes using Notion-Targeted Rates (NTR) (e.g., selection rates, TPR, FPR) and Group-wise Precision (the quality of accepted candidates).

3. Key Contributions

Theoretical Framework: A distribution-free, structural analysis of fair ML impacts that distinguishes between the intrinsic effects of fairness constraints and finite-sample noise.
Regime Differentiation: A rigorous theoretical proof that the impact of fairness is fundamentally different depending on whether sensitive attributes are available (Attribute-Aware) or hidden (Attribute-Blind).
Mechanism Identification: The identification of "Masked Candidates" as the primary driver of "leveling down" in attribute-blind settings. These are individuals who appear to belong to one group based on non-sensitive features but actually belong to the other, causing fairness interventions to misfire.
Structural Guidance: Clear conditions under which fairness leads to "leveling up" (benefiting the disadvantaged) versus "leveling down" (harming one or both groups).

4. Key Results

A. Attribute-Aware Regime (Sensitive Attributes Available)

In this setting, the model can apply group-specific thresholds.

Deterministic Impact: Enforcing fairness always (weakly) improves outcomes for the disadvantaged group and (weakly) worsens outcomes for the advantaged group.
Mechanism: The fairness constraint shifts the decision threshold in opposite directions for the two groups.
- Advantaged Group: Threshold increases $\rightarrow$ fewer selected $\rightarrow$ selection rate drops, but precision increases (only higher-quality candidates remain).
- Disadvantaged Group: Threshold decreases $\rightarrow$ more selected $\rightarrow$ selection rate rises, but precision decreases (lower-quality candidates are included).
Conclusion: In the aware regime, fairness acts as a redistribution of decisions. It never harms the disadvantaged group in terms of selection rates, though it may lower their average quality of acceptance.

B. Attribute-Blind Regime (Sensitive Attributes Hidden)

In this setting, the model must use a single threshold for all individuals based on non-sensitive features.

Distribution-Dependent Impact: The effect of fairness is not predetermined. It can lead to:
1. Leveling Up: Disadvantaged group improves, advantaged group stays same or improves slightly.
2. Leveling Down (Same Direction): Both groups' outcomes move in the same direction (e.g., both selection rates drop, or both rise).
3. Mixed Effects: One group improves while the other worsens, but not necessarily in the "standard" way.
The Role of "Masked Candidates":
- Because $S$ is hidden, the model classifies individuals based on features $X$ . Some individuals from the disadvantaged group may have features that look "advantaged-like" (and vice versa).
- When fairness constraints are applied, the model adjusts thresholds based on the aggregate distribution of features.
- Scenario 1 (Deletion-Only): If the fairness constraint forces the removal of candidates from a specific feature region ( $Q_h$ ), and that region contains a mix of both groups, both groups may see their selection rates drop (Leveling Down).
- Scenario 2 (Inclusion-Only): If the constraint forces the addition of candidates from a region ( $Q_l$ ), both groups may see their selection rates rise.
Precision Dynamics: Unlike the aware regime, precision in the blind regime can move in any direction for either group, depending on the alignment of the feature distribution ( $\eta$ ) and the fairness correction term.

5. Significance and Implications

Policy and Design: The paper warns that simply enforcing fairness constraints without understanding the deployment regime (aware vs. blind) can lead to unintended negative consequences.
Legal Context: In jurisdictions where sensitive attributes are banned (Attribute-Blind), "fair" algorithms may inadvertently harm the very groups they are meant to protect (e.g., by lowering selection rates for everyone) or fail to improve them.
Systemic Risk: The concept of "leveling down" highlights that fairness is not a zero-sum game where the disadvantaged gain and the advantaged lose; in blind settings, it can be a negative-sum game where everyone loses.
Future Directions: The authors suggest that decision-makers must carefully evaluate the data distribution and the availability of sensitive attributes before deploying fairness interventions. In blind settings, "masked" candidates are the key variable determining whether fairness will succeed or backfire.

In summary, the paper provides a rigorous theoretical foundation showing that fairness is not universally beneficial. Its impact is structural and contingent on whether the system can "see" the sensitive attributes. In the absence of such visibility, fairness constraints can lead to systemic leveling down, harming both advantaged and disadvantaged groups simultaneously.

Fairness May Backfire: When Leveling-Down Occurs in Fair Machine Learning

The Big Picture: The "Leveling Down" Problem

Scenario 1: The "Open Book" Exam (Attribute-Aware)

Scenario 2: The "Blind Audition" (Attribute-Blind)

The Core Takeaway: Why "Blind" Fairness is Tricky

Summary for Decision Makers

1. Problem Statement

2. Methodology

3. Key Contributions

4. Key Results

A. Attribute-Aware Regime (Sensitive Attributes Available)

B. Attribute-Blind Regime (Sensitive Attributes Hidden)

5. Significance and Implications

More like this

Mitigating Instance Entanglement in Instance-Dependent Partial Label Learning

Missingness Bias Calibration in Feature Attribution Explanations

Why Is RLHF Alignment Shallow? A Gradient Analysis

Differential Privacy in Two-Layer Networks: How DP-SGD Harms Fairness and Robustness

U-Parking: Distributed UWB-Assisted Autonomous Parking System with Robust Localization and Intelligent Planning