(Un)fair devices: Moving beyond AI accuracy in personal sensing

Imagine your smartwatch, fitness ring, or health app as a personal detective. This detective is always watching, listening, and measuring your every move to tell you how healthy you are, how well you slept, or if your heart is skipping a beat.

For years, we've asked this detective one question: "Are you accurate?" If the detective says you took 10,000 steps, we want to know if it was actually 10,000.

But this paper argues that asking only about accuracy is like asking a chef, "Does this soup taste good?" without asking, "Does it taste good to everyone?"

The authors, a team of researchers from universities and tech giants, are saying: "No, our personal devices are not fair to everyone." They are often biased, meaning they work great for some people but fail miserably for others.

Here is the breakdown of the problem and the solution, using some everyday analogies.

1. The Problem: The "One-Size-Fits-None" Detective

The core issue is that the "training data" used to teach these devices is skewed. Think of it like a cooking school where 90% of the students are tall, white, young men from wealthy countries.

The Result: The school teaches the students how to cook for tall, white, young men.
The Reality: When a short woman, an elderly person, or someone with a different skin tone tries to eat the food, it tastes wrong.

The paper highlights several real-world examples of this "bad cooking":

The Skin Tone Blindness: Pulse oximeters (the clip that goes on your finger to check oxygen) often give wrong readings for people with darker skin. It's like a camera with a flash that is calibrated for pale faces; it overexposes dark skin, making it look like there's no oxygen when there is. This can be dangerous during medical emergencies.
The Body Type Blindness: Heart rate sensors on smartwatches often struggle with people who have larger arms or more muscle mass. It's like trying to measure a river's depth with a ruler meant for a bathtub; the tool just doesn't fit the context.
The Age Blindness: Voice assistants and hearing aids often fail to understand children or the elderly because they were trained mostly on the voices of middle-aged adults. It's like a translator who only speaks "Gen Z slang" and can't understand a grandparent's dialect.

2. The Hidden Danger: The "Invisible Glitch"

In traditional software, if a program crashes, you see a big red error message. But in personal devices, the bias is invisible.

Imagine a GPS navigation app. If the GPS is biased, it doesn't just say "Error." Instead, it quietly sends you down a bumpy, dangerous road because it thinks you prefer bumpy roads, simply because it has only ever driven people who live on bumpy roads.

The paper warns that these devices are now making high-stakes decisions:

Diagnosing heart conditions (Atrial Fibrillation).
Predicting fertility.
Detecting sleep apnea.

If the "detective" is biased, it might tell a Black patient they are fine when they are actually in danger, or tell a woman her fertility chances are lower than they really are. The paper found that in the last few years, less than 10% of research papers on these devices even bothered to check if their tools were fair to different groups of people.

3. Why Is This So Hard to Fix?

The authors explain that personal device data is tricky.

It's a moving target: Unlike a static photo, a heart rate signal changes every second. It's like trying to judge a movie by looking at a single, blurry frame.
It's messy: A high heart rate could mean you are running, or it could mean you are stressed, or you just drank coffee. The device has to guess the cause, and if it guesses wrong for certain groups, that's bias.
The "WEIRD" Trap: Most research is done on Western, Educated, Industrialized, Rich, and Democratic populations. It's like testing a new car only on smooth highways in California, then selling it to people who drive on muddy roads in the rain.

4. The Solution: "Fairness by Design"

The authors propose a new rulebook called "Fairness by Design." Instead of building the device first and checking for bias later (like checking for cracks in a bridge after it's built), we must build fairness into the foundation.

They offer 14 guidelines for researchers and companies, which can be summarized as:

Recruit a Diverse Crew: Don't just test on your university friends. Test on people of all ages, sizes, skin tones, and backgrounds.
Listen to the Critics: If a device works poorly for a specific group, don't just say "that's an outlier." Fix the model.
Be Transparent: Tell users, "This device works best for people with X characteristics, and might be less accurate for Y."
Keep Watching: Devices get old, sensors get dirty, and people change. A model that is fair today might be biased tomorrow. You have to keep checking it.

The Bottom Line

The paper concludes that accuracy is not enough. A device that is 99% accurate for 50% of the population is a failure if it is 50% accurate for the other 50%.

We need to move from asking "Is this device smart?" to asking "Is this device smart for me?"

By treating fairness as a core feature—just like battery life or water resistance—we can ensure that these digital companions actually help everyone, not just a lucky few.

(Un)fair devices: Moving beyond AI accuracy in personal sensing

1. The Problem: The "One-Size-Fits-None" Detective

2. The Hidden Danger: The "Invisible Glitch"

3. Why Is This So Hard to Fix?

4. The Solution: "Fairness by Design"

The Bottom Line

1. Problem Statement

2. Methodology

3. Key Contributions

4. Key Results & Findings

5. Significance and Recommendations

(Un)fair devices: Moving beyond AI accuracy in personal sensing

1. The Problem: The "One-Size-Fits-None" Detective

2. The Hidden Danger: The "Invisible Glitch"

3. Why Is This So Hard to Fix?

4. The Solution: "Fairness by Design"

The Bottom Line

1. Problem Statement

2. Methodology

3. Key Contributions

4. Key Results & Findings

5. Significance and Recommendations

More like this

Robust Multi-agent Communication via Multi-view Message Certification

DySCo: Dynamic Semantic Compression for Effective Long-term Time Series Forecasting

Sven: Singular Value Descent as a Computationally Efficient Natural Gradient Method

Forecasting Supply Chain Disruptions with Foresight Learning

UQ-SHRED: uncertainty quantification of shallow recurrent decoder networks for sparse sensing via engression