Entropies, cross-entropies and Rényi divergence: sharp three-term inequalities for probability density functions

This paper establishes a new sharp three-term inequality linking differential Rényi entropy, Rényi divergence, and Rényi cross-entropy for probability density functions, demonstrating that equality holds when one density is an escort of the other and using this result to derive further sharp bounds involving various informational functionals like absolute moments and Fisher information.

Razvan Gabriel Iagar, David Puertas-Centeno

Published Tue, 10 Ma
📖 5 min read🧠 Deep dive

Imagine you are a chef trying to perfect a recipe. You have your original recipe (let's call it ff), a modified version of that recipe (gg), and perhaps a third, hybrid version (hh).

In the world of information theory, these "recipes" are actually probability distributions—mathematical descriptions of how likely different outcomes are (like the chance of rain, the distribution of heights in a city, or the noise in a computer signal).

This paper is about finding the perfect "balance sheet" between three specific ways of measuring these recipes:

  1. Entropy: How much "surprise" or "disorder" is in a single recipe.
  2. Divergence: How different two recipes are from each other.
  3. Cross-Entropy: A hybrid measure that mixes the surprise of one recipe with the structure of another.

The Big Discovery: The "Golden Rule" of Recipes

The authors, Razvan Gabriel Iagar and David Puertas-Centeno, discovered a sharp inequality. In math terms, an inequality is like saying "A + B is always less than or equal to C." A sharp inequality means you can't make that statement any tighter; it's the absolute limit.

They found that if you mix the Entropy of your original recipe and the Divergence (difference) between your original and modified recipe, the sum will never exceed the Cross-Entropy of the two, provided you follow a specific mathematical "recipe" for how you mix them.

The Magic Condition:
The paper reveals that this limit is reached (the "equality" holds) only when the modified recipe (gg) is an "Escort" of the original (ff).

  • The Analogy: Imagine your original recipe is a standard cake. An "Escort" recipe isn't a totally different cake; it's the original cake where you've simply turned up the volume on the chocolate flavor and turned down the vanilla, based on a specific mathematical rule. It's the same cake, just "stretched" or "squashed" in a very precise way. If you change the recipe in any other random way, the rule breaks.

The Toolkit: The "Time-Traveling" Transformations

The real genius of this paper isn't just the one rule; it's the toolkit they built to apply that rule to almost anything.

They introduce a concept called "Reciprocal Transformations."

  • The Metaphor: Imagine you have a magical mirror. If you look at your recipe in the mirror, it changes shape (maybe it gets taller and thinner). But here's the trick: if you look at the difference between your original recipe and a second one in that mirror, the amount of difference stays exactly the same. The mirror distorts the shapes, but it preserves the "distance" between them.

The authors use this mirror trick to take their "Golden Rule" (the inequality) and apply it to many different types of measurements:

  1. Moments (The "Weight" of the Recipe): Instead of just measuring the average height of people, they measure the "heaviness" of the tails (extreme outliers).
  2. Fisher Information (The "Sharpness" of the Recipe): This measures how sensitive a recipe is to small changes. Is the cake very delicate, or is it a sturdy brick?
  3. Cross-Measures: They create new "hybrid" tools that mix these concepts (like "Cross-Fisher Information").

Why Does This Matter?

Think of this paper as a universal adapter for information theory.

Before this, scientists had to prove a specific rule for "Moments," then start from scratch to prove a rule for "Fisher Information," and another for "Entropy." It was like having to build a new bridge for every single river.

This paper says: "No, we have a universal bridge (the general framework of transformations). If you can prove the rule for one type of river (Entropy), you can instantly transport that proof to any other river (Moments, Fisher, etc.) just by using our magic mirror."

The "Three-Ingredient" Surprise

The paper also introduces a new functional that depends on three probability densities (f,g,hf, g, h).

  • The Analogy: Usually, you compare two things (A vs. B). But sometimes, you need a referee (C) to judge the comparison. The authors created a new "Cross-Divergence" measure that uses this third party to bound the difference between the first two. It's like saying, "The distance between A and B is limited by how they both relate to C."

Summary in Plain English

  1. The Core Rule: There is a strict mathematical limit on how much "surprise" and "difference" can exist between two probability distributions.
  2. The Special Case: This limit is only reached when one distribution is a mathematically "scaled" version of the other (an Escort density).
  3. The Method: The authors built a "magic mirror" system (measure-preserving transformations) that lets them take this one core rule and instantly generate dozens of new, precise rules for other complex measurements (like how "spiky" a distribution is or how "heavy" its tails are).
  4. The Result: They have created a powerful, unified framework that makes it much easier to find the absolute best possible limits (bounds) for how information behaves in complex systems, from physics to cybersecurity.

In short, they found the master key that unlocks a whole new set of precise mathematical locks in the field of information theory.