COOL-MC: Verifying and Explaining RL Policies for Platelet Inventory Management

This paper demonstrates the application of the COOL-MC tool to formally verify and explain a reinforcement learning policy for platelet inventory management, confirming its high safety performance and revealing its reliance on inventory age distribution through probabilistic model checking and counterfactual analysis.

Dennis Gross

Published 2026-03-04
📖 4 min read☕ Coffee break read

Imagine you are the manager of a very special, high-stakes grocery store. But instead of selling apples or bread, you sell platelets—a type of blood cell that helps people clot and stop bleeding.

Here's the catch: These "groceries" are incredibly fragile. They expire in just five days.

The Impossible Balancing Act

Your job is to order just the right amount every day.

  • Order too much? The extra platelets rot before anyone can use them. This is a waste of a rare, life-saving resource and costs the hospital money.
  • Order too little? A patient arrives who needs a transfusion, but you have none. This is a life-or-death emergency.

In the past, humans tried to guess the right amount using math formulas. But demand is unpredictable (like a sudden flu outbreak), and the math gets too complicated. So, scientists turned to Artificial Intelligence (AI), specifically something called Reinforcement Learning (RL).

Think of the AI as a super-intelligent apprentice. You let it run the store for thousands of days in a computer simulation. It makes mistakes, gets "fined" for waste or shortages, and eventually learns a perfect strategy.

The "Black Box" Problem

Here's the trouble: The AI learns its strategy inside a neural network, which is like a giant, tangled ball of yarn. It knows what to do, but no one knows why.

  • If the AI orders 14 units on a Tuesday, is it because it's Tuesday? Because the weather is rainy? Or because it has 3 units of "fresh" stock and 2 units of "old" stock?
  • In a hospital, you can't just say, "Trust the robot." You need to know why it made that decision before you let it run a blood bank.

Enter COOL-MC: The AI Detective

This paper introduces a tool called COOL-MC. Think of it as a detective and a translator rolled into one. It takes the AI's "black box" brain and turns it into a clear, transparent map that humans can read.

Here is how COOL-MC solves the mystery, using simple analogies:

1. The "Reachable Map" (Simplifying the Maze)

The AI's world is huge, with millions of possible scenarios. Checking every single one is like trying to read every page of every book in a library to find one sentence.

  • COOL-MC's Trick: It only builds a map of the places the AI actually visits. It ignores the empty, unused rooms. This makes the map small enough to analyze quickly.

2. The "Safety Inspector" (Checking the Rules)

Once the map is built, COOL-MC asks strict questions, like a safety inspector:

  • "What is the exact chance that we run out of blood in the next 200 days?"
  • "What is the chance we have too much blood that will rot?"
  • The Result: The AI's strategy was found to be very safe. It has only a 2.9% chance of running out and a 1.1% chance of having too much. It passed the test!

3. The "Feature Pruning" (The Blindfold Test)

To understand why the AI makes decisions, the researchers played a game of "What if?"

  • They put a blindfold on the AI by hiding one piece of information at a time (like hiding the "Day of the Week" or the "Age of the blood").
  • The Discovery: When they hid the age of the blood (how old the platelets are), the AI's performance crashed. It started running out of stock or wasting blood.
  • The Lesson: The AI learned that freshness is everything. It barely cares what day of the week it is; it cares deeply about whether the blood is 1 day old or 4 days old.

4. The "Time Traveler" (Counterfactuals)

Finally, they asked: "What if we forced the AI to order less?"

  • They took a specific order size (14 units) that the AI liked and forced it to order only 6 units instead, in all the situations where it usually ordered 14.
  • The Surprise: The safety numbers barely changed!
  • The Lesson: This means the AI was only ordering 14 units when it had a huge "safety buffer" of blood already. It wasn't being greedy; it was being cautious. If it had ordered less, it would have been just as safe.

The Big Takeaway

This paper proves that we can use AI to manage life-saving supplies, but we can't just trust the "magic box." We need tools like COOL-MC to:

  1. Verify that the AI won't kill anyone (Safety).
  2. Explain exactly what the AI is thinking (Transparency).

It turns a mysterious, scary robot into a transparent, auditable partner that doctors and blood bank managers can actually trust with human lives.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →