A Large-Scale Neutral Comparison Study of Survival Models on Low-Dimensional Data

This large-scale neutral benchmark study evaluates 19 survival models across 34 low-dimensional datasets and concludes that, despite the superior performance of some complex machine learning methods in specific metrics, the standard Cox Proportional Hazards model remains a robust and sufficient choice for most practitioners in this setting.

Lukas Burk, John Zobolas, Bernd Bischl, Andreas Bender, Marvin N. Wright, Raphael Sonabend

Published 2026-03-03
📖 4 min read☕ Coffee break read

Imagine you are a doctor trying to predict how long a patient might live after a diagnosis. You have a list of patients, some of whom have passed away (the "event"), and some who are still alive but you've lost track of them or the study ended (this is called "censoring"). Your goal is to build a crystal ball that predicts survival time based on their medical features.

For decades, doctors have used a classic, reliable tool for this: the Cox Proportional Hazards model. It's like a trusted, old-fashioned Swiss Army knife—simple, sturdy, and gets the job done.

In recent years, however, a new generation of "super-tools" has arrived: Machine Learning (ML) algorithms. These are like high-tech, laser-guided drones. They are complex, can handle massive amounts of data, and promise to see patterns the old tools miss.

The Big Question: Do these fancy, complex drones actually work better than the trusty Swiss Army knife for the average doctor dealing with standard medical data?

This paper is the ultimate "neutral taste test" to find out.

The "Taste Test" Setup

The authors, a team of statisticians and data scientists, didn't just pick one dataset or one model. They set up a massive, fair competition:

  1. The Contestants: They gathered 19 different models.
    • The Classics: The old-school statistical methods (like the Cox model).
    • The Moderns: The fancy machine learning methods (like Random Forests, Boosting, and Neural Networks).
  2. The Arena: They tested these models on 34 different real-world datasets (like patient records from hospitals).
  3. The Rules:
    • No Cheating: They treated every model exactly the same. They didn't give the "fancy" models a head start or the "old" models a handicap.
    • Tuning: They spent time adjusting the settings (hyperparameters) for every single model to make sure each one was performing at its absolute best.
    • The Scorecard: They measured success in two ways:
      • Ranking: Can the model tell who will die sooner and who will live longer? (Discrimination).
      • Accuracy: Is the predicted probability of survival actually correct? (Calibration/Scoring).

The Results: The Underdog Wins (Again)

After running thousands of simulations and crunching the numbers, the results were surprising to many in the tech world but comforting to many in the medical world:

The "Fancy" Drones didn't beat the "Old" Knife.

  • The Verdict: While some of the complex machine learning models did slightly better in specific cases, none of them significantly outperformed the classic Cox Proportional Hazards model when averaged across all the datasets.
  • The "Best" of the Rest: A few complex models (like Oblique Random Survival Forests) came close, but they didn't win the race.
  • The Takeaway: For standard, low-dimensional data (where you have more patients than variables, which is common in medicine), the simple, old-school Cox model is still the champion. It is robust, easy to interpret, and just as accurate as the complex alternatives.

Why This Matters: The "Over-Engineering" Trap

Think of it like this: If you need to drive to the grocery store, you don't need a Formula 1 race car. A reliable sedan (the Cox model) gets you there just as fast, costs less to maintain, and is easier to drive.

The paper warns practitioners against "over-engineering." Just because you can use a complex AI model doesn't mean you should.

  • Complexity Cost: The fancy models are harder to tune, take longer to run, and are often "black boxes" (you can't easily explain why they made a prediction).
  • The Recommendation: Start with the simple Cox model. Only switch to the complex machine learning tools if you have a very specific, difficult problem that the simple model can't solve.

In a Nutshell

This study is a massive, fair comparison that says: "Don't throw away your old tools just because new ones look shinier."

For most survival analysis problems in medicine and business, the classic, simple statistical methods remain the gold standard. The complex machine learning models are powerful, but for this specific job, they aren't necessarily better. They are the Ferrari in a traffic jam; the old sedan is still the most efficient way to get to work.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →