Estimation of relative risk, odds ratio and their logarithms with guaranteed accuracy and controlled sample size ratio

This paper proposes two-stage sequential estimators for relative risk, odds ratio, and their logarithms that guarantee a target mean-square error for any population parameters while maintaining a controlled sample size ratio and achieving high efficiency near the Cramér-Rao bound.

Luis Mendo

Published 2026-03-06
📖 6 min read🧠 Deep dive

Imagine you are a detective trying to solve a mystery involving two different groups of people. Let's call them Team A and Team B.

Your goal is to figure out how much more likely a specific event (like catching a cold, or buying a product) is to happen in Team A compared to Team B. In statistics, we call this the Relative Risk. You might also want to know the "Odds Ratio," which is a slightly different way of measuring that same comparison, or the "Log" versions, which are just mathematical tweaks to make the numbers easier to handle.

The problem? You don't know the true probabilities. You only know that Team A has a probability p1p_1 and Team B has p2p_2, but these numbers are hidden from you.

The Old Way vs. The New Way

The Old Way (Fixed Sample Size):
Imagine you decide to interview exactly 100 people from Team A and 100 from Team B, no matter what.

  • The Flaw: If the event is very rare (like winning the lottery), interviewing 100 people might not give you enough "successes" to make a good guess. Your estimate would be shaky. If the event is super common, you might have interviewed way more people than you needed, wasting time and money.
  • The Risk: You can't guarantee your answer will be accurate enough for every situation.

The New Way (This Paper's Solution):
The author, Luis Mendo, proposes a smart, two-step detective strategy that adapts as it goes. Think of it like a video game where you level up your equipment based on how the previous level went.

Step 1: The "Scout" Mission (First Stage)

You send out a small team of scouts to both Team A and Team B. You don't stop until you find a specific number of "successes" (e.g., 5 people who got sick).

  • Why? This gives you a rough, quick estimate of how common the event is in both groups.
  • The Magic: Because you stop based on finding successes (not a fixed number of people), you get a reliable "feel" for the rarity of the event, even if it's very rare.

Step 2: The "Main Mission" (Second Stage)

Now, you look at what the scouts found.

  • If the event is rare: You know you need to interview many more people to get a good answer.
  • If the event is common: You know you don't need to interview as many.
  • The Adjustment: The paper provides a mathematical formula to calculate exactly how many more people you need to interview in each group to hit a specific accuracy target (let's say, "I want my answer to be within 5% of the truth").

Crucially, this formula also lets you control the ratio of your teams. Maybe you have twice as many people available in Team A as Team B. The math ensures you use that extra manpower efficiently so you don't waste resources, while still hitting your accuracy goal.

Two Ways to Gather Data

The paper also covers two ways to collect your data:

  1. Element Sampling (One by One): You interview people individually. As soon as you need one more person from Team A, you grab one. This is the most efficient way but requires you to be able to stop and start at any moment.
  2. Group Sampling (Batches): Imagine you can only interview people in pre-packaged groups (e.g., a bus of 10 people from Team A and a bus of 10 from Team B arrive together).
    • The Challenge: You might need 12 people from Team A, but the bus only brings 10. You take the bus, interview the 10, and then have to wait for the next bus to get the remaining 2.
    • The Paper's Fix: The author shows how to handle these "leftover" people. At the end of the process, you might end up with a few extra people you don't need (who you politely send home), but the math guarantees that your final answer is still just as accurate as if you had interviewed them one by one.

Why is this a Big Deal?

In the world of statistics, there is a "Gold Standard" called the Cramér-Rao Bound. It's like the theoretical speed limit for how fast and accurate an estimator can possibly be.

  • Efficiency: The author proves that his method is very efficient. It gets close to that speed limit.
  • Guaranteed Accuracy: Unlike other methods that say "on average, this is accurate," this method says, "No matter what the hidden probabilities are, I guarantee your error will be below this specific number."
  • Versatility: It works for Relative Risk, Odds Ratios, and their logarithmic versions (which are used in machine learning and medical studies).

The Analogy of the "Smart Bucket"

Imagine you are trying to fill two buckets with water to a specific level (your accuracy target).

  • Bucket A has a leak (low probability of success).
  • Bucket B has a tiny hole (high probability of success).

The Old Method: You pour water from a hose for exactly 10 minutes into both. Bucket A might be empty, and Bucket B might overflow. You failed.

This Paper's Method:

  1. Scout: You pour a little water to see how fast the buckets fill.
  2. Adjust: You realize Bucket A is leaking fast, so you turn the hose up high. You realize Bucket B is holding water well, so you turn the hose down.
  3. Stop: You stop exactly when both buckets reach the perfect level.
  4. Group Constraint: If you can only pour water in 5-gallon jugs (Group Sampling), you might pour a little extra into Bucket B, but the math ensures you still hit the target level with minimal waste.

Summary

This paper gives statisticians and data scientists a universal toolkit to compare two groups. Whether you are testing a new vaccine, analyzing marketing data, or training an AI, this method ensures you get the most accurate answer possible without wasting time or money, and it works even when you are forced to collect data in batches. It turns a guessing game into a precise science.