k-hop Fairness: Addressing Disparities in Graph Link Prediction Beyond First-Order Neighborhoods

Imagine you are the mayor of a bustling digital city called Graphville. In this city, people (nodes) are connected by friendships and professional ties (edges). Your job is to act as a Matchmaker (Link Prediction), suggesting new connections to help people find friends or job opportunities.

However, Graphville has a problem: it's naturally segregated. People tend to hang out with others who look like them, speak the same language, or live in the same neighborhood. This is called homophily.

The Old Way: The "Two-Group" Matchmaker

For a long time, fairness experts tried to fix this with a simple rule: "Make sure Group A and Group B mix equally."

They looked at every single new friendship suggestion and asked: "Is this a connection between two different groups?" If the answer was yes, they counted it as "fair." If no, they counted it as "unfair."

The Flaw: This approach is like a teacher who only checks if a student is sitting next to someone from a different group, without looking at where the student is sitting in the classroom.

The Problem: If a student is already sitting in the middle of a diverse, popular group, adding one more friend from a different group doesn't help them much. But if a student is sitting alone in a corner (isolated), adding that same friend is a huge lifeline.
The Result: The old matchmaker keeps suggesting friends for the popular kids (because it's easy to find a "different" friend for them) and ignores the lonely kids in the corners. The "fairness" metric looks good on paper, but the lonely kids are still lonely.

The New Idea: "k-hop Fairness" (The Distance Detective)

The authors of this paper say: "Stop looking just at the immediate neighbor. Look at the whole neighborhood!"

They introduce a new concept called k-hop Fairness.

1-hop: Your immediate best friends.
2-hop: Your friends' friends.
3-hop: Your friends' friends' friends.

Instead of just checking if two people are different, they ask: "Does everyone in the city have equal access to diverse information at different distances?"

Imagine you are checking the 2-hop connections. You want to make sure that even if you are in a segregated corner of the city, your friends' friends include people from other groups. This ensures that information and opportunities can flow to the isolated parts of the city, not just the popular hubs.

How They Fixed It (The Toolkit)

The paper proposes two ways to fix the Matchmaker's suggestions:

Pre-Processing (Redrawing the Map): Before the Matchmaker even starts working, they slightly tweak the city map. They add a few new "bridges" (edges) between isolated neighborhoods and diverse ones. This changes the underlying structure so the Matchmaker naturally suggests better connections.
- Analogy: It's like building a new bridge between two islands before the ferry schedule is made.
Post-Processing (The Final Edit): The Matchmaker does their job first, generating a list of suggestions. Then, a "Fairness Editor" looks at the list. If the editor sees that the suggestions are ignoring the isolated corners at a specific distance (say, 3 hops away), they tweak the scores to boost those connections.
- Analogy: The Matchmaker writes a list of dates. The Editor reads it and says, "Hey, you suggested 10 dates for the popular kids, but zero for the quiet kid in the back. Let's swap a few to make it balanced."

What They Found (The Results)

The researchers tested this on real-world data (like political blogs, Facebook, and academic networks) and found three big things:

Bias travels: If a city is biased at the "1-hop" level (immediate friends), it often creates bias at the "3-hop" level too. You can't just fix the immediate friends; you have to look further out.
The Ripple Effect: If you try to fix the bias at one specific distance (like 2 hops), it changes the bias at other distances (like 3 hops). It's like pushing a domino; it affects the whole chain. You have to be careful about which domino you push.
The Winner: Their "Post-Processing" method (the Final Editor) was the best. It managed to make the suggestions fairer for isolated groups without ruining the Matchmaker's ability to predict good connections. It found the sweet spot between "being accurate" and "being fair."

The Big Takeaway

Traditional fairness in graphs is like checking if a party has a mix of people from different backgrounds. k-hop Fairness is like checking if everyone at the party, even the shy ones in the corner, has a clear path to meet people from those other backgrounds.

By looking at the distance between people, not just their immediate neighbors, we can build recommendation systems that don't just look fair on paper, but actually help everyone in the network get a fair shot at new opportunities.

1. Problem Statement

Link prediction (LP) in graph-based applications (e.g., social recommendations) often inadvertently amplifies structural biases present in real-world data, particularly homophily (the tendency of similar nodes to connect). While existing "Fair Link Prediction" methods address this via dyadic fairness, they suffer from critical limitations:

Blindness to Topology: Dyadic fairness treats all inter-group edges as equivalent, regardless of the underlying graph structure. It fails to distinguish between connecting two well-connected nodes versus connecting isolated, segregated nodes.
Aggregation Bias: Standard metrics (like Demographic Parity) aggregate disparities across all graph distances into a single scalar. This obscures specific biases occurring at particular "hops" (distances) and prevents targeted mitigation.
Intra-group Disparities: By focusing solely on edge types (intra- vs. inter-group), dyadic fairness can inadvertently reinforce disparities within sensitive groups, improving the status of already privileged nodes while leaving isolated nodes behind.

The paper argues that fairness must be assessed not just by the type of link, but by the structural position of the nodes involved, specifically within their $k$ -hop neighborhoods.

2. Methodology

The authors propose a framework centered on $k$ -hop fairness, which evaluates disparities conditioned on the shortest-path distance between nodes.

A. Definitions and Metrics

$k$ -hop Node Attribute Exposure ( $f^{(k)}_s(v)$ ): A local measure quantifying the probability that a node $v$ connects to a node with sensitive attribute $s$ at exactly distance $k$ .
$k$ -hop Group Exposure ( $\phi^{(k)}_{s \to s'}$ ): The average exposure of nodes in sensitive group $s$ to group $s'$ at distance $k$ . This aggregates local exposures uniformly across nodes, avoiding bias toward structurally central nodes.
$k$ -hop Neighborhood Fairness Gap ( $NF^{(k)}$ ): The primary fairness metric. It measures the maximum difference in exposure between different target groups for a source group at a specific hop $k$ .
$NF^{(k)}(h) = \max_{s} \max_{s_1 \neq s_2} |\phi^{(k)}_{s_1 \to s}(h) - \phi^{(k)}_{s_2 \to s}(h)|$
$k$ -hop Structural Bias ( $NB^{(k)}$ ): A structural metric measuring the inherent bias in the graph's topology at distance $k$ , independent of any predictor. It serves as a proxy for the potential fairness of predictions at that hop.

B. Mitigation Strategies

The paper proposes model-agnostic strategies to control bias at specific hops:

Pre-processing (Graph Rewiring):
- Modifies the graph structure (adjacency matrix $A$ ) to minimize $NB^{(k)}$ .
- Uses an optimization loss: $\mathcal{L}^{(k)}_{pre}(A') = NB^{(k)}(A') + \alpha \|A' - A\|_F$ .
- Challenge: Computing shortest paths is non-differentiable. The authors use powers of the adjacency matrix ( $A^k$ ) to approximate $k$ -hop connectivity in a differentiable manner for gradient-based optimization.
Post-processing (Prediction Adjustment):
- Adjusts the output probability matrix $P$ of a trained link predictor without retraining.
- Restricts modifications to node pairs exactly $k$ -hops apart using a mask matrix $\tilde{A}^{(k)}$ .
- Optimization loss: $\min_U \mathcal{L}^{(k)}_{post} = NF^{(k)}(P') + \alpha \|U \odot \tilde{A}^{(k)}\|_F$ , where $P' = \Pi_{[0,1]}(P + U \odot \tilde{A}^{(k)})$ .
- This approach ensures that optimizing fairness at hop $k$ does not negatively impact fairness at other hops ( $k' \neq k$ ).

C. Computational Complexity

Computing $NF^{(k)}$ and $NB^{(k)}$ has a complexity of $O(n^2)$ under standard sparsity assumptions using Breadth-First Search (BFS) or matrix operations.
The pre-processing optimization iteration cost is $O(kn^2)$ .

3. Key Contributions

Conceptual Framework: Introduction of $k$ -hop fairness, shifting the paradigm from edge-centric (dyadic) to node-centric, distance-aware fairness. This reveals that biases are not uniform across the graph but vary significantly by hop distance.
Algorithmic Tools: Development of model-agnostic pre- and post-processing methods to quantify and mitigate $k$ -hop inequities. These methods are applicable to weighted and directed graphs.
Theoretical Insight: Demonstration that standard dyadic metrics ( $\Delta DP$ ) are a weighted sum of $k$ -hop local terms, masking the specific distances where bias originates.
Empirical Validation: Extensive experiments showing that structural bias and predictive fairness are highly correlated at specific hops, and that targeted mitigation is possible without degrading performance at other hops.

4. Experimental Results

The authors evaluated their approach on real-world datasets (Polblogs, Facebook, Pokec, Citeseer) and a synthetic graph using standard LP models (Node2Vec, GCN, GraphSAGE) and fair baselines (FairWalk, CrossWalk, etc.).

RQ1: Relationship between Bias and Fairness:
- There is a strong correlation between structural bias ( $NB^{(k)}$ ) and predictive fairness ( $NF^{(k)}$ ). The hops with the highest structural bias correspond to the lowest fairness.
- Bias profiles vary significantly by dataset; for example, Polblogs shows high bias at $k=4$ , while Facebook is biased primarily at $k=1$ . This confirms that a "one-size-fits-all" aggregation of fairness is ineffective.
RQ2: Interdependence of Hops (Pre-processing):
- Reducing bias at one hop (e.g., $k=2$ ) via edge addition often propagates to other hops. The effects can be reinforcing or compensatory (e.g., reducing $NB^{(2)}$ in Polblogs increased $NB^{(3)}$ ). This suggests pre-processing requires careful, data-specific strategies.
RQ3: Post-processing Effectiveness:
- The proposed post-processing method successfully reduces targeted $NF^{(k)}$ significantly.
- Performance Trade-off:
  - At $k=1$ , fairness improves with no loss in predictive performance (AUC) because $k=1$ edges are typically training edges and do not overlap with test edges.
  - At intermediate hops ( $k=2$ ), there is a performance drop because real-world graphs have high clustering, meaning $k=2$ pairs often overlap with test edges.
  - At larger hops ( $k>2$ ), performance degradation is minimal or non-existent, as test edges rarely fall at these specific distances.
- Modularity: Unlike pre-processing, post-processing allows independent optimization of fairness at different hops without cross-contamination.

5. Significance and Conclusion

This paper fundamentally challenges the prevailing dyadic fairness paradigm in graph machine learning. By introducing $k$ -hop fairness, the authors demonstrate that:

Fairness is a multi-scale phenomenon; a graph can be fair at one distance and highly biased at another.
Existing methods that aggregate fairness across the whole graph fail to address the specific structural mechanisms causing disparity (e.g., segregation at long distances).
Targeted mitigation is feasible and effective. The proposed post-processing method offers a practical, model-agnostic way to improve fairness for specific user groups at specific network distances (e.g., "2-hop recommendations") without sacrificing global model performance.

The work provides a crucial step toward more nuanced, structurally aware AI systems that can address systemic inequalities in social networks, moving beyond simple demographic parity to ensure equitable access to information and opportunities across the entire network topology.

k-hop Fairness: Addressing Disparities in Graph Link Prediction Beyond First-Order Neighborhoods

The Old Way: The "Two-Group" Matchmaker

The New Idea: "k-hop Fairness" (The Distance Detective)

How They Fixed It (The Toolkit)

What They Found (The Results)

The Big Takeaway

1. Problem Statement

2. Methodology

A. Definitions and Metrics

B. Mitigation Strategies

C. Computational Complexity

3. Key Contributions

4. Experimental Results

5. Significance and Conclusion

More like this

Complexity of Classical Acceleration for ℓ1\ell_1ℓ1​-Regularized PageRank

MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?

Language Guided Adversarial Purification

Graph-based Active Learning for Entity Cluster Repair

Neural Green's Operators for Parametric Partial Differential Equations

Complexity of Classical Acceleration for $\ell_1$ -Regularized PageRank