An Experimental Study on Fairness-aware Machine Learning for Credit Scoring Problems

This paper presents a comprehensive experimental study demonstrating that fairness-aware machine learning models achieve a superior balance between predictive accuracy and fairness compared to traditional classification models in the context of credit scoring.

Huyen Giang Thi Thu, Thang Viet Doan, Ha-Bang Ban, Tai Le Quy

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you are a bank manager. In the old days, if someone wanted a loan, you'd sit them down, look at their paperwork, maybe ask a few questions, and use your gut feeling to decide if they were a "good" risk or a "bad" risk. It was slow, it took a lot of coffee breaks, and it relied heavily on human judgment.

Today, banks use Machine Learning (ML) to do this instantly. Think of ML as a super-fast, super-smart robot assistant that reads thousands of financial records in a split second to predict who will pay back their loan and who won't. This is called Credit Scoring.

But here's the problem: Robots can be biased.

If you train a robot on historical data where, say, women were unfairly denied loans in the past, the robot might learn that "being a woman" means "don't give a loan." It's not being evil; it's just copying the mistakes of the past. This is the "unfairness" the paper talks about.

The Mission: Fixing the Robot's Brain

The authors of this paper asked a big question: "We have all these fancy new tools to make these robots fair, but do they actually work in the real world of banking?"

They decided to run a massive experiment. They didn't just talk about theory; they put these tools to the test.

The Ingredients of the Experiment

1. The Test Subjects (The Datasets)
They gathered five different "playgrounds" of real-world financial data. Imagine these as five different neighborhoods with different types of people. Some neighborhoods had data on credit card users, others on German applicants, and others on clients from Central Asia. They checked these neighborhoods to see if the data itself was already biased (like finding that the neighborhood records already treated men and women differently). They found that, yes, the data was often biased to begin with.

2. The Tools (The Models)
They tested three different ways to "fix" the robot:

  • Pre-processing (Cleaning the Ingredients): Before the robot even starts cooking, you wash and sort the vegetables. You remove the bad stuff or balance the ingredients so the robot doesn't start with a bias.
    • Analogy: Like a chef tasting the soup before adding salt to make sure it's not too salty.
  • In-processing (Training with Rules): You teach the robot a new rule while it's learning. "Hey, when you make a decision, you must treat men and women exactly the same, even if it's slightly harder to get the answer right."
    • Analogy: Like a teacher telling a student, "You can't just guess; you have to follow this specific fair rule to get an A."
  • Post-processing (Adjusting the Result): The robot makes its decision, and then a human (or another program) looks at the result and tweaks it. "Wait, you denied this woman, but based on the rules, you should have approved her. Let's flip the switch."
    • Analogy: Like a referee blowing the whistle after a play to say, "That was a foul, let's restart."

3. The Scorecard (Fairness Measures)
How do you know if the robot is actually fair? You can't just look at it. You need a ruler. The paper tested about 8 different "fairness rulers."

  • Statistical Parity: Does the robot approve men and women at the same rate?
  • Equal Opportunity: If a man and a woman are both good borrowers, does the robot give them both a loan?
  • Predictive Parity: If the robot says someone is a "good" borrower, is that prediction equally accurate for both men and women?

What Did They Find? (The Results)

The experiment was like a race between traditional robots (the old, biased ones) and the new "Fairness-Aware" robots.

  • The Old Way: The traditional models were often very accurate at predicting who would pay back, but they were often unfair. They might accidentally discriminate against a specific group.
  • The "Fair" Way: The new models did a great job of being fair. They significantly reduced the discrimination.
  • The Trade-off: Usually, when you make something fair, it gets a little less accurate (like a referee being so strict they miss a real goal). However, the paper found that the In-processing method (specifically a tool called AdaFair) was the "Goldilocks" solution. It managed to be both highly accurate and very fair. It didn't have to sacrifice much accuracy to be fair.

The Big Winner: The AdaFair model was the star of the show. It consistently balanced the need to make money (accuracy) with the need to be just (fairness) better than the other methods across all five different neighborhoods (datasets).

The Takeaway

This paper is a reality check for the banking world. It says:

  1. Bias is real: Our data is flawed, so our robots will be flawed unless we fix them.
  2. Tools exist: We have the technology to fix this.
  3. It works: We don't have to choose between making money and being fair. With the right tools (like AdaFair), we can do both.

In simple terms: The authors proved that we can teach our banking robots to be not just smart, but also kind and fair, without making them stupid. It's about building a financial system where everyone gets a fair shot, regardless of who they are.