GeoDiv: Framework For Measuring Geographical Diversity In Text-To-Image Models

The paper introduces GeoDiv, a novel framework leveraging large language and vision-language models to systematically measure and reveal significant geographical biases and socio-economic stereotypes in text-to-image generation, demonstrating how current models disproportionately portray countries like India, Nigeria, and Colombia in impoverished ways.

Abhipsa Basu, Mohana Singh, Shashank Agnihotri, Margret Keuper, R. Venkatesh Babu

Published 2026-02-26
📖 5 min read🧠 Deep dive

Imagine you have a magical camera that can take a picture of anything you describe. You say, "Take a photo of a house in Nigeria," and snap! It creates one. You say, "Take a photo of a house in Japan," and snap! It creates another.

This is how Text-to-Image (T2I) models work today. They are incredibly popular, but there's a problem: they are biased.

If you ask this magical camera to show you a house in Nigeria, it might always show you a crumbling, dusty shack. If you ask for a house in Japan, it might always show you a pristine, futuristic apartment. It's as if the camera has a broken lens that only sees the world through a very narrow, stereotypical filter. It forgets that Nigeria has modern skyscrapers and Japan has old, rustic villages, too.

This paper introduces a new tool called GeoDiv (Geographical Diversity) to fix this. Think of GeoDiv as a "World-Check Inspector."

The Two Main Tools in the Inspector's Kit

The researchers built GeoDiv to measure diversity in two specific ways, like checking a car for both its engine and its paint job.

1. The "Socio-Economic Visual Index" (SEVI) – The "Wealth & Condition" Check

This part of the inspector looks at the vibe of the image. It asks two big questions:

  • Affluence: Does this look rich or poor? (Is it a mansion or a shack?)
  • Maintenance: Does this look brand new and well-cared-for, or is it broken and worn out?

The Analogy: Imagine you are judging a neighborhood.

  • The Bias: If the camera always shows Nigeria as a "broken-down, poor neighborhood" and the USA as a "perfect, shiny suburb," the SEVI score will be terrible. It reveals that the AI is reinforcing the stereotype that some countries are always poor and others are always rich.
  • The Finding: The paper found that models like FLUX.1 are great at making things look "shiny and rich" (high maintenance), but they make every country look the same. Meanwhile, older models often make developing countries look "broken and poor" by default.

2. The "Visual Diversity Index" (VDI) – The "Variety" Check

This part of the inspector looks at the details. It asks: "Are all the houses the same color? Are all the roads the same type?"

  • Entity Appearance: What does the object look like? (Is the car a red sedan or a blue truck? Is the house made of brick or mud?)
  • Background Appearance: What's around it? (Is the road paved with asphalt, or is it a dirt path? Are there mountains or just flat fields?)

The Analogy: Imagine a box of crayons.

  • Low Diversity: If you ask for "a car in 10 different countries" and the AI gives you 10 identical red sedans on a paved road, that's like having a box with only one red crayon. It's boring and fake.
  • High Diversity: A good AI should give you a red sedan in the US, a tuk-tuk in India, a dirt bike in Kenya, and a vintage car in Italy. That's a full box of crayons!
  • The Finding: The paper found that while newer AI models are getting better at making things look "real," they are actually getting worse at showing variety. They are becoming too uniform.

How Does GeoDiv Actually Work?

Instead of a human looking at 160,000 pictures (which would take forever), GeoDiv uses AI assistants (Large Language Models and Vision-Language Models) to do the heavy lifting.

  1. The Interviewer: The AI acts like a reporter. It looks at a picture of a house in Nigeria and asks, "Is the roof flat or sloped? Is the road dirt or paved? Does this look wealthy or poor?"
  2. The Scorekeeper: It counts the answers. If 90% of the houses in Nigeria have dirt roads and 90% of the houses in the UK have paved roads, the AI knows there is a bias.
  3. The Report Card: It gives the AI model a score. A high score means the AI shows the world as it really is (diverse and varied). A low score means the AI is stuck in a stereotype.

What Did They Discover?

The "World-Check Inspector" found some shocking things:

  • The "Poor Country" Trap: When asked to generate images of countries like India, Nigeria, and Colombia, the AI almost always made them look impoverished and dilapidated. It rarely showed them as modern or wealthy.
  • The "Rich Country" Filter: When asked for USA, UK, or Japan, the AI almost always made them look affluent, clean, and perfect.
  • The "One-Size-Fits-All" Problem: Newer models (like FLUX.1) are so good at making things look "pretty" that they make every country look like a wealthy Western suburb. They lost the unique cultural flavors of different places.

Why Does This Matter?

If we let these AI models keep making these biased pictures, they will start to shape how we see the world. If an AI always shows Nigeria as a place of poverty, people might start to believe that's all there is to Nigeria.

GeoDiv is the first tool that gives us a "report card" for these models. It doesn't just say "this looks bad"; it tells us exactly where the bias is (e.g., "You are making all Nigerian roads look like dirt").

The Bottom Line

This paper is like a mirror held up to Artificial Intelligence. It shows us that while AI is amazing at creating art, it is currently a very bad traveler. It only knows the stereotypes.

GeoDiv is the compass that helps developers fix the map, ensuring that when we ask AI to show us the world, it shows us the real world—messy, diverse, beautiful, and full of surprises, not just a collection of stereotypes.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →