Finite-Sample Decision Instability in Threshold-Based Process Capability Approval

This study reveals that process capability decisions based on fixed thresholds (e.g., Cpk1.33C_{pk} \geq 1.33) using moderate sample sizes inherently carry significant instability and boundary risk, as the probability of acceptance converges to 0.5 when the true capability equals the threshold, a finding supported by asymptotic theory, simulations, and empirical data from 880 manufacturing dimensions.

Fei Jiang, Lei Yang

Published Fri, 13 Ma
📖 5 min read🧠 Deep dive

Imagine you are a quality inspector at a factory. Your job is to decide if a new batch of parts is good enough to ship to a customer. You have a strict rule: "If the quality score is 1.33 or higher, we ship it. If it's lower, we reject it."

This sounds simple and fair, right? But this paper reveals a hidden trap in that simple rule, especially when you only have a small amount of data to work with.

Here is the story of Decision Instability, explained through a few everyday analogies.

1. The "Fuzzy Ruler" Problem

Imagine you are trying to measure a piece of wood to see if it is exactly 10 inches long. You have a ruler, but it's a bit wobbly, and you can only take a few measurements (maybe 30 or 50).

  • The Reality: The wood might actually be exactly 10 inches long.
  • The Measurement: Because your ruler is wobbly and you only measured it a few times, your average measurement might come out as 10.05 inches (Pass!) or 9.95 inches (Fail!).

In the world of manufacturing, the "quality score" (called CpkC_{pk}) is like that measurement. It's not a fixed number; it's an estimate based on a small sample. Because it's an estimate, it has "noise" or "jitter."

2. The "Coin Flip" at the Edge

The paper's biggest discovery is what happens when a part is right on the edge of the rule.

Let's say the rule is 1.33.

  • If the true quality of the part is exactly 1.33, what happens?
  • Because of the "wobble" in your measurements, sometimes your calculation will say 1.34 (Pass), and sometimes it will say 1.32 (Fail).

The paper proves mathematically that if the true quality is exactly on the line, your decision becomes a 50/50 coin flip.

  • 50% chance: You approve a part that is barely good enough.
  • 50% chance: You reject a part that is actually good enough.

The Analogy: Imagine a tightrope walker standing exactly in the middle of a rope. If the wind blows even a tiny bit (which is like the random noise in your data), they will fall to the left or the right with equal probability. The "decision" of which side they land on is completely unstable, even though the walker is perfectly balanced.

3. The "Danger Zone" (The Ridge)

The authors found that this instability isn't just for parts that are exactly 1.33. It happens for anything close to 1.33.

They call this the "Instability Ridge."

  • If a part is way above 1.33 (say, 1.80), you are almost 100% sure to pass it.
  • If a part is way below 1.33 (say, 0.80), you are almost 100% sure to reject it.
  • But if a part is in the "Danger Zone" (between 1.25 and 1.40), your decision is shaky.

The paper looked at 880 real-world factory dimensions and found that about 11% of them were sitting right in this Danger Zone. This means that for a huge chunk of real products, the decision to ship or scrap them is essentially a gamble based on how the random numbers happened to land that day.

4. Why "More Data" Doesn't Fully Fix It

You might think, "If I measure 1,000 parts instead of 30, the wobble goes away, right?"

Yes, the wobble gets smaller, but the "Danger Zone" just gets narrower. It never disappears completely unless the part is far away from the line.

  • With a small sample (30 parts), the danger zone is wide.
  • With a huge sample (1,000 parts), the danger zone is a thin sliver.

But in real life, factories often don't have time or money to measure thousands of parts. They measure 30 or 50. In this "small sample" world, the danger zone is wide enough to catch many products.

5. The Solution: The "Safety Buffer"

So, what should factories do? The paper suggests we stop treating the rule as a hard, sharp line.

Instead of saying "Pass if \ge 1.33," we should add a Safety Buffer (or a "Guard Band").

  • Old Rule: Pass if Score \ge 1.33.
  • New Smart Rule: Pass if Score \ge 1.62 (or whatever the math says is safe).

The Metaphor: Imagine a parking spot.

  • Old Way: "If your bumper is past the line, you're parked." (If you are exactly on the line, you might get a ticket or not, depending on the officer's mood).
  • New Way: "You must park at least 6 inches inside the line to be considered parked."

By moving the goalpost further away from the edge, you ensure that even with the "wobble" of your measurements, you are still safely inside the "Pass" zone.

Summary

  • The Problem: Using a fixed number (like 1.33) to make a Pass/Fail decision is risky when you only have a small sample of data.
  • The Surprise: If a product is right on the edge, the decision is a coin flip. You might reject a good product or accept a bad one purely by chance.
  • The Reality: Many real-world products sit right on this edge, making their approval status unstable.
  • The Fix: We need to add a "safety margin" to our rules. We shouldn't just look at the number; we need to account for the fact that our measurement is a bit fuzzy.

In short: Don't trust a single number on the edge. If you are standing on the line, you aren't really "safe" until you take a step back.