Learning Risk Preferences in Markov Decision Processes: an Application to the Fourth Down Decision in the National Football League

This paper employs an inverse optimization framework on NFL play-by-play data to demonstrate that coaches' historically conservative fourth-down decisions are consistent with optimizing low quantiles of future value, revealing that their risk preferences have become more tolerant over time and vary based on field position.

Nathan Sandholtz, Lucas Wu, Martin Puterman, Timothy C. Y. Chan

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you are watching a football game. It's 4th down, and the team has to make a huge choice: Go for it (try to get the first down), Kick a Field Goal (try for 3 points), or Punt (kick the ball away to the other team).

For decades, statisticians have looked at these decisions and said, "Wait a minute! The coaches are playing it too safe. If they just used a computer model to calculate the odds, they would go for it way more often."

But the coaches keep making the "safe" choice. Why?

This paper asks a fascinating question: What if the coaches aren't making mistakes? What if they are actually playing a different game than the statisticians think they are?

Here is the simple breakdown of how the authors solved this mystery.

1. The Detective Work: "Inverse Optimization"

Usually, if you want to know what a coach is thinking, you ask them. But coaches rarely say, "I'm scared of losing the ball."

Instead, the authors used a method called Inverse Optimization. Think of it like this:

  • Normal Math: You know the rules and the goal, so you calculate the best move.
  • Inverse Math: You see the move the person actually made, and you work backward to figure out what their "hidden rulebook" must have been.

The authors assumed the coaches were making the best possible decision for their specific goals. They just had to figure out what those goals were.

2. The "Risk" Meter: The Quantile

To explain the coaches' behavior, the authors realized the coaches weren't trying to maximize their average points (like a gambler hoping for the best). They were trying to avoid the worst-case scenarios.

They used a concept called a Quantile. Imagine a line of people ranked from "Lucky" to "Unlucky."

  • If you are Risk-Neutral (like a standard computer model), you care about the average person in the middle of the line.
  • If you are Risk-Averse (like a nervous coach), you care about the worst 10% of people on the line. You want to make sure that even if things go badly, you don't end up in the bottom 10%.

The authors found that NFL coaches are essentially playing a game where they are trying to optimize the bottom 30% to 40% of possible outcomes. They are terrified of the "worst-case scenario" (turning the ball over on 4th down), even if the "average" outcome suggests they should take the risk.

3. The "Field Half" Analogy

The study discovered a funny quirk in how coaches think, depending on where they are on the field.

  • In Your Own Half (The "Home" Zone): The coaches are super conservative. They are like a parent driving a car with a child in the backseat. They would rather drive 10 miles per hour under the speed limit than risk a single scratch on the bumper. They almost never go for it here.
  • In the Opponent's Half (The "Away" Zone): The coaches become much bolder. They are like a surfer riding a wave. They are willing to take more risks because the reward (scoring points) is right there, and the "worst-case" scenario (giving the ball back) feels less catastrophic than it does in their own territory.

4. The "Time Travel" Discovery

The authors also looked at how this has changed over time (from 2014 to 2022).

  • Then: Coaches were extremely scared of the worst-case scenario.
  • Now: Coaches are slowly becoming a little bit braver. They are starting to trust the math a little more, or perhaps they are just tired of losing games by being too safe.

5. The "Video Game" Connection

To make this work, the authors built a massive Video Game of the NFL.

  • They fed the game 9 years of real-life play-by-play data.
  • They programmed the "physics" of the game (how likely a team is to get a first down, how likely a field goal is to go in).
  • Then, they ran the "Inverse Optimization" engine. It asked: "If the coaches are playing this game perfectly, what 'Risk Meter' setting must they have turned on?"

The Big Takeaway

The paper concludes that coaches aren't "stupid" or "bad at math." They are just risk-averse.

They are playing a game where the penalty for a mistake is so high (losing the ball in a good spot) that they are willing to accept a lower average score just to avoid that one bad outcome.

In simple terms:
If a statistician says, "On average, you should go for it," the coach thinks, "But what if I fail? Then I look like an idiot and we lose." The coach is optimizing for not looking like an idiot, not for winning the most points on average.

This study gives us a new way to understand human decision-making: sometimes, the "safe" choice isn't a mistake; it's a calculated move to avoid the worst possible nightmare.