Comparative e-backtests for general risk measures

This paper develops a non-parametric sequential framework using e-values and e-processes to perform anytime-valid comparative backtests for general elicitable risk measures, offering robust performance under dependence and misspecification while introducing a modified three-zone approach for more informative model evaluation.

Zhanyi Jiao, Qiuqi Wang, Yimiao Zhao

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you are the head of a bank's risk department. Every day, you have to predict how much money you might lose in a storm (a market crash). You build a complex computer model to make these predictions. But how do you know if your model is actually good? And more importantly, is it better than the "standard" model the government regulators use?

This paper introduces a new, smarter way to answer those questions using a concept called "E-values" (think of them as Evidence Accumulators).

Here is the breakdown of the paper's ideas using simple analogies:

1. The Problem: The Old Way vs. The New Way

The Old Way (Standard Backtesting):
Imagine you are a teacher grading a student's homework. The old method asks: "Did the student get the right answer?"
If the student's prediction was close to the actual loss, they pass. If not, they fail.

  • The Flaw: In the real world, we don't know the "perfect" answer in advance. Also, regulators don't just want to know if your model is "okay"; they want to know if it's better than the government's standard model. The old way can't easily compare two models against each other while data is still coming in.

The New Way (Comparative E-Backtesting):
This paper suggests a new game. Instead of asking "Is the answer right?", we ask: "Is Model A beating Model B?"
We use E-values as a scorecard.

  • The Metaphor: Imagine a Gambling Machine.
    • You have two horses: Horse A (your bank's model) and Horse B (the regulator's model).
    • Every day, a race happens (the market moves).
    • You place a bet on which horse is faster.
    • If your horse wins, your "Evidence Pot" (the E-value) grows. If it loses, the pot shrinks.
    • The magic of this paper is that you can check the pot at any time. You don't have to wait until the end of the year. If the pot gets huge, you know for sure your horse is winning. If it stays small, you know your horse is losing.

2. The Secret Sauce: "E-Processes" and "Anytime Validity"

In traditional statistics, you have to decide how many races you will run before you start. If you peek at the results halfway through and decide to stop early, you might cheat the math (this is called "p-hacking").

This paper uses E-processes, which are like a Self-Adjusting Compass.

  • The Analogy: Imagine you are walking through a foggy forest. You have a compass that tells you if you are walking North (towards the truth) or South.
  • The Superpower: No matter how long you walk, or when you stop to check the compass, it never lies. You can stop after 10 steps or 10,000 steps, and the compass still gives you a valid answer. This is called "Anytime Validity." It means regulators can check your model's performance every single day without worrying about statistical tricks.

3. The "Three-Zone" Traffic Light System

The paper proposes a new way to interpret the results, moving away from a simple "Pass/Fail" to a Traffic Light System:

  • 🟢 Green Light: Your model is clearly beating the regulator's model. (The Evidence Pot for "You are better" is huge).
  • 🔴 Red Light: Your model is clearly worse than the regulator's model. (The Evidence Pot for "You are worse" is huge).
  • 🟡 Yellow Light: It's a tie, or the race is too close to call yet.
    • The Innovation: The authors add a clever twist here. Even in the Yellow zone, they look at Speed and Magnitude.
    • Speed: Which model started winning faster?
    • Magnitude: Which model is winning by a larger margin?
    • This helps break ties and gives a clearer answer even when the models are very similar.

4. Handling the Chaos: Structural Changes

Financial markets are messy. Sometimes they are calm, and sometimes they are chaotic (like during a pandemic or a financial crisis).

  • The Problem: A model might be great in calm weather but terrible in a storm.
  • The Solution: The paper suggests that when a major event happens (a "structural change"), you can reset the race.
  • The Analogy: Imagine a marathon. If the weather suddenly changes from sunny to a blizzard, you don't disqualify the runners; you just start a new leg of the race to see who handles the snow better. The paper's method allows regulators to "restart" the E-value counter during crises to see which model adapts best to the new reality.

5. Why This Matters

  • For Banks: It gives them a fair, real-time way to prove their internal models are working, potentially saving them money on capital reserves if they can prove they are better than the standard.
  • For Regulators: It provides a robust tool to catch bad models immediately, even if the data is messy or the market is crashing. It stops banks from "gaming" the system by waiting until the end of the year to fix their models.
  • For Everyone: It moves risk management from "guessing if we are right" to "measuring who is winning right now."

Summary

This paper replaces the old, rigid "Pass/Fail" test with a dynamic, real-time scoreboard that works even when the data is messy or the market is crashing. It uses a "betting" metaphor to accumulate evidence, allowing regulators and banks to see clearly which risk model is the champion, at any moment in time.