Sample size in social contact surveys for epidemic modelling

This study analyzes existing social contact surveys and uses simulations to demonstrate that while small samples yield highly variable reproduction number estimates, a minimum sample size of approximately 1,200–1,300 participants is sufficient to achieve reliable precision for epidemic modeling, with diminishing returns observed beyond 3,000 individuals.

Original authors: Danon, L., Brooks-Pollock, E.

Published 2026-03-31
📖 4 min read☕ Coffee break read

Original authors: Danon, L., Brooks-Pollock, E.

Original paper licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are trying to predict how fast a rumor will spread through a giant school. To do this, you need to know: Who talks to whom?

This paper is about figuring out the perfect number of students you need to interview to get a clear picture of that rumor-spreading network, without wasting time or money.

Here is the breakdown of the research in simple terms:

1. The Problem: Guessing the Size of the Crowd

Scientists use "social contact surveys" to map out who touches, talks to, or sits near whom. These maps are crucial for predicting how diseases (like the flu or COVID) will spread.

However, researchers have been guessing how many people to ask. Some surveys asked just 30 people; others asked 10,000. It was like trying to guess the weather by looking at one cloud versus looking at the whole sky. There was no standard rule, and often, the surveys were too small to be reliable, or too big and wasteful.

2. The Experiment: The "Subsampling" Game

The authors took two massive, real-world datasets (one from the UK and one from across Europe) that had thousands of participants. They treated these huge datasets like a giant jar of mixed jellybeans.

Then, they played a game:

  • They took out a tiny handful (200 people) and tried to guess the "spread potential" (how fast a disease would move).
  • Then they took a medium handful (1,000 people).
  • Then a huge handful (5,000 people).

They did this hundreds of times to see how much their answers changed based on the size of the handful.

3. The Results: The "Goldilocks" Zone

The results were very clear, and they found a "Goldilocks" zone for the sample size:

  • Too Small (Under 200 people): This is like trying to guess the average height of a basketball team by measuring just two people. The results were all over the place. Sometimes the disease looked like it would die out; other times, it looked like it would explode. The data was too shaky to trust.
  • The Sweet Spot (Around 1,200–1,300 people): Once they reached about 1,300 people, the answers started to settle down. The "noise" disappeared, and the picture became clear. Adding more people after this point didn't change the answer very much.
  • Too Big (Over 3,000 people): Asking 5,000 or 10,000 people gave slightly more precise answers, but the improvement was tiny. It was like adding a drop of water to a full bucket. It's a lot of extra work for almost no extra benefit.

4. The Analogy: The Concert Crowd

Think of a social contact survey like trying to figure out the vibe of a massive concert crowd.

  • If you ask 5 people, you might just happen to ask a group of friends who are all standing in the corner. You'd think the whole crowd is quiet.
  • If you ask 1,300 people scattered all over the venue, you get a true mix of the mosh pit, the VIP section, and the back row. You get a reliable picture of the whole event.
  • If you ask 10,000 people, you are still just getting that same picture, but you've spent hours interviewing people who didn't add any new information.

5. The Conclusion: What Should We Do?

The authors are saying: "Stop guessing."

If you are a government or health official planning a study to track disease risks, you don't need to interview 10,000 people, but you definitely shouldn't stop at 200.

The Rule of Thumb: Aim for 1,200 to 1,300 participants.

  • This is enough to give you a reliable map of how people mix.
  • It saves money and time.
  • It prevents panic caused by bad data (like thinking a disease is spreading fast when it's actually just a fluke in a tiny sample).

In short, this paper gives scientists a ruler to measure their surveys, ensuring that when they predict the next epidemic, they are looking at the whole picture, not just a blurry snapshot.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →