Accounting for place: confounding via geography obscures polygenic evidence on mental health and environmental exposures in the UK Biobank

This study demonstrates that explicitly accounting for local geographic context using multilevel models is crucial in UK Biobank analyses, as it reveals that traditional single-level models adjusted only for genetic principal components can produce biased associations between polygenic scores for mental health and environmental exposures like greenspace due to unaddressed geographic confounding.

Original authors: Reed, Z. E., Morris, T. T., Davis, O. S. P., Davey Smith, G., Munafo, M. R., Griffith, G. J.

Published 2026-05-27
📖 5 min read🧠 Deep dive

Original authors: Reed, Z. E., Morris, T. T., Davis, O. S. P., Davey Smith, G., Munafo, M. R., Griffith, G. J.

Original paper licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Picture: The "Neighborhood Effect" on Mental Health

Imagine you are trying to figure out if a specific type of seed (a person's genetic makeup for mental health) grows better in a garden (a green, leafy neighborhood) or a concrete parking lot (an urban, built-up area).

For a long time, scientists have looked at this by planting seeds in different gardens and counting how many flowers grow. They noticed a pattern: people with certain genetic risks for depression or schizophrenia often seem to live in less green areas, while those with better genetic wellbeing seem to live in greener places.

But this paper argues that scientists have been looking at the wrong thing. They were blaming the seed for the soil, when really, the soil (the neighborhood) was influencing where the seeds ended up in the first place.

The Problem: "Confounding via Geography"

The authors call this problem "Confounding via Geography." Here is a simple way to think about it:

Imagine you are a detective trying to solve a mystery. You notice that people who live in London tend to have a different genetic profile than people who live in the Cotswolds (a rural area).

  • The Mistake: If you just compare Londoners and Cotswolders without thinking about why they are different, you might think, "Oh, living in London causes this genetic difference."
  • The Reality: It's not the city that changed their genes. It's that people with certain backgrounds, incomes, and life stories (which are clustered in cities vs. countryside) moved there. The city didn't cause the genetics; the history and economics of the place caused both the type of people who live there and the type of environment they live in.

The paper says that previous studies often forgot to account for this "geography trap." They treated every person as an independent data point, like individual dice rolls, when really, people in the same neighborhood are more similar to each other than to people in the next town over.

The Experiment: The "Magic Lens" (Multilevel Models)

To fix this, the researchers used a special statistical tool called a Mundlak model. Think of this as a magic lens that lets them look at the data in two different ways at the same time:

  1. The "Within-Neighborhood" View: This looks at people inside the same neighborhood. It asks: "If two people live on the same street, and one has a high genetic risk for depression, do they live in a less green house than their neighbor?"
  2. The "Between-Neighborhood" View: This looks at the differences between neighborhoods. It asks: "Do neighborhoods with a high average of depression genes generally have less green space?"

By separating these two views, the researchers could see if the "green space" effect was real for the individual, or if it was just an illusion caused by the fact that certain types of neighborhoods attract certain types of people.

What They Found: The Plot Twist

When they looked at the data without the "magic lens" (the old way), they found what everyone expected:

  • People with genes for depression seemed to live in less green areas.
  • People with genes for schizophrenia seemed to live in less green areas.

But when they used the "magic lens" (the new way), the story changed completely:

  1. The "Noise" Disappeared: When they looked strictly at people within the same neighborhood, the link between bad genes and bad neighborhoods mostly vanished.
  2. The Sign Flipped: For schizophrenia, the results actually flipped! When looking strictly within neighborhoods, people with a higher genetic risk for schizophrenia actually seemed to live in greener places than their neighbors.
  3. The "Between" Effect is the Culprit: The reason the old studies saw a link was because of the Between-Neighborhood effect. Wealthier, greener areas tend to have different populations than poorer, urban areas. The old studies were accidentally blaming the individual's genes for the neighborhood's poverty or wealth.

The "Residual" Map: Where the Model Broke

The researchers also looked at the "leftover" errors in their model (residuals). They found that the model struggled the most in extreme places:

  • The Peak District and National Parks: These very rural, green areas had huge "errors" in the model.
  • Central London: These very urban areas also had huge errors.

This suggests that you cannot simply compare a person in a National Park to a person in London and assume they are "exchangeable" (comparable). The differences between these places are so massive (in terms of history, economy, and culture) that a simple genetic comparison doesn't work unless you account for the massive gap between the two worlds.

The Main Takeaway

The paper concludes that geography is a powerful confounder.

If you are a geneticist studying mental health, you can't just look at a person's DNA and their address and assume the address is a result of their DNA. You have to realize that where people live is often determined by factors that also shape their DNA distribution (like migration patterns, economic history, and local policies).

The Solution:
The authors suggest that scientists should routinely use this "magic lens" (multilevel modeling) to separate the "within-neighborhood" effects from the "between-neighborhood" effects. If the results change drastically when you do this, it means the original finding was likely just a trick of geography, not a true biological cause.

In short: Don't blame the seed for the soil. Sometimes, the soil just happens to be where certain seeds ended up, and we need better tools to tell the difference.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →