Does Reasoning Make Search More Fair? Comparing Fairness in Reasoning and Non-Reasoning Rerankers

This paper presents the first systematic comparison of fairness between reasoning and non-reasoning rerankers, finding that reasoning models neither improve nor harm fairness compared to traditional approaches, as they largely preserve the fairness characteristics of their input rankings.

Saron Samuel, Benjamin Van Durme, Eugene Yang

Published Thu, 12 Ma
📖 5 min read🧠 Deep dive

Imagine you are the head chef at a busy restaurant. Your job is to serve customers the best dishes (search results) based on what they order (search queries).

For a long time, your kitchen had a simple rule: "Serve the tastiest dish first." This is like a standard search engine. It looks at a list of 500 possible dishes and picks the ones that taste the most like what the customer asked for.

Recently, a new type of chef has arrived: the "Reasoning Chef." Before serving a dish, this chef doesn't just taste it; they pause, think, and write a little note explaining why it's the best choice. "This soup is great because it has fresh herbs, and the customer loves herbs," they might think. These reasoning chefs have proven to be incredibly good at finding the tastiest dishes.

But here is the big question the paper asks: Does this new "thinking" process also make the menu fairer?

The Big Question: Does Thinking Make It Fairer?

"Fairness" in this context means making sure the menu isn't just about the most popular dishes. It means ensuring that if a customer asks for "soup," they get a mix of soups from different regions (USA, Sweden, Finland, etc.), not just 9 out of 10 soups from the USA, even if the Swedish soups are just as tasty.

The researchers wanted to know: Does the "Reasoning Chef" naturally start serving a more diverse menu, or do they just serve the same old favorites, only with a longer explanation?

The Experiment: A Taste Test

The researchers set up a massive taste test using a special library of recipes (the TREC 2022 Fair Ranking dataset). They compared two types of chefs:

  1. The Old School Chef (Non-Reasoning): Picks the best dish instantly based on taste.
  2. The Thinking Chef (Reasoning): Pauses, writes a justification, and then picks the dish.

They tested them in different scenarios:

  • The Raw Ingredients: Starting with a basic list of dishes.
  • The Rewritten Order: Sometimes customers give a messy order ("sailing, boat, ocean"). The researchers rewrote it into a clear sentence ("Tell me about sailing and types of boats") to see if clearer instructions helped.

The Surprising Results

Here is what they found, broken down simply:

1. The Thinking Chef is Great at Taste, But Not at Diversity
The "Reasoning Chefs" were fantastic at finding the most relevant dishes. If you asked for "sailing," they found the best sailing articles. However, they did not make the menu more fair.

  • If the initial list of dishes was mostly from the USA, the Reasoning Chef served a list that was still mostly from the USA.
  • If the list was diverse, the Reasoning Chef kept it diverse.
  • The Analogy: Imagine a librarian who is asked to find the "best" books. If the library only has books by men, the librarian (even a super-smart, thinking one) will still only recommend books by men. The librarian isn't ignoring women; they just can't recommend books that don't exist in the library.

2. The "Thinking" Didn't Change the Outcome
Whether the chef thought deeply about why a dish was good or just picked it quickly, the final result was almost identical in terms of fairness.

  • The Metaphor: It's like two people sorting a deck of cards. One person sorts them by looking at the card and thinking, "This is a King, it's important." The other just looks and says, "King." If the deck only has Kings and Queens, both people will end up with the same pile. The "thinking" didn't magically turn a Queen into a King or bring in a new suit.

3. The Real Problem is the "Library," Not the "Chef"
The study found that the biggest unfairness came from Geography (where the content is from). Even when the chefs were given a "perfect" list of relevant documents (an "Oracle" list where relevance was 100% perfect), they still struggled to be fair regarding geography.

  • Why? Because the information about where a document is from isn't always written in the text. If a recipe doesn't say "Made in Poland," the chef (AI) can't guess it.
  • The Lesson: You can't fix a lack of diversity by just making the chef smarter. You have to diversify the ingredients in the kitchen first.

4. Clearer Orders Help Everyone
The researchers found that when customers gave clear, natural language orders (e.g., "Tell me about sailing") instead of messy keywords (e.g., "sailing boat ocean"), everyone did a better job. Both the Old School and Thinking chefs found better results. This suggests that how we ask questions matters more than how the computer thinks.

The Bottom Line

Does reasoning make search more fair?
No, not yet.

The "Thinking" AI models are amazing at finding the right answer, but they don't automatically know to find a fair mix of answers. They are like a mirror: if you feed them a biased list of information, they will reflect that bias back at you, even if they write a long, thoughtful essay about why they chose it.

What needs to happen?
To get fair search results, we can't just rely on smarter algorithms. We need to:

  1. Fill the library: Make sure the internet (the data) has content from all over the world and all different groups.
  2. Ask better questions: Use clear, natural language when searching.
  3. Build fairness into the training: We need to teach these AI chefs explicitly, "Hey, if two dishes taste the same, please serve one from a different country," rather than hoping they figure it out on their own.

In short: Thinking helps you find the best answer, but it doesn't help you find a fair answer unless you teach it to care about fairness first.