Entropies, cross-entropies and R\'enyi divergence: sharp three-term inequalities for probability density functions

Imagine you are a chef trying to perfect a recipe. You have your original recipe (let's call it $f$ ), a modified version of that recipe ( $g$ ), and perhaps a third, hybrid version ( $h$ ).

In the world of information theory, these "recipes" are actually probability distributions—mathematical descriptions of how likely different outcomes are (like the chance of rain, the distribution of heights in a city, or the noise in a computer signal).

This paper is about finding the perfect "balance sheet" between three specific ways of measuring these recipes:

Entropy: How much "surprise" or "disorder" is in a single recipe.
Divergence: How different two recipes are from each other.
Cross-Entropy: A hybrid measure that mixes the surprise of one recipe with the structure of another.

The Big Discovery: The "Golden Rule" of Recipes

The authors, Razvan Gabriel Iagar and David Puertas-Centeno, discovered a sharp inequality. In math terms, an inequality is like saying "A + B is always less than or equal to C." A sharp inequality means you can't make that statement any tighter; it's the absolute limit.

They found that if you mix the Entropy of your original recipe and the Divergence (difference) between your original and modified recipe, the sum will never exceed the Cross-Entropy of the two, provided you follow a specific mathematical "recipe" for how you mix them.

The Magic Condition:
The paper reveals that this limit is reached (the "equality" holds) only when the modified recipe ( $g$ ) is an "Escort" of the original ( $f$ ).

The Analogy: Imagine your original recipe is a standard cake. An "Escort" recipe isn't a totally different cake; it's the original cake where you've simply turned up the volume on the chocolate flavor and turned down the vanilla, based on a specific mathematical rule. It's the same cake, just "stretched" or "squashed" in a very precise way. If you change the recipe in any other random way, the rule breaks.

The Toolkit: The "Time-Traveling" Transformations

The real genius of this paper isn't just the one rule; it's the toolkit they built to apply that rule to almost anything.

They introduce a concept called "Reciprocal Transformations."

The Metaphor: Imagine you have a magical mirror. If you look at your recipe in the mirror, it changes shape (maybe it gets taller and thinner). But here's the trick: if you look at the difference between your original recipe and a second one in that mirror, the amount of difference stays exactly the same. The mirror distorts the shapes, but it preserves the "distance" between them.

The authors use this mirror trick to take their "Golden Rule" (the inequality) and apply it to many different types of measurements:

Moments (The "Weight" of the Recipe): Instead of just measuring the average height of people, they measure the "heaviness" of the tails (extreme outliers).
Fisher Information (The "Sharpness" of the Recipe): This measures how sensitive a recipe is to small changes. Is the cake very delicate, or is it a sturdy brick?
Cross-Measures: They create new "hybrid" tools that mix these concepts (like "Cross-Fisher Information").

Why Does This Matter?

Think of this paper as a universal adapter for information theory.

Before this, scientists had to prove a specific rule for "Moments," then start from scratch to prove a rule for "Fisher Information," and another for "Entropy." It was like having to build a new bridge for every single river.

This paper says: "No, we have a universal bridge (the general framework of transformations). If you can prove the rule for one type of river (Entropy), you can instantly transport that proof to any other river (Moments, Fisher, etc.) just by using our magic mirror."

The "Three-Ingredient" Surprise

The paper also introduces a new functional that depends on three probability densities ( $f, g, h$ ).

The Analogy: Usually, you compare two things (A vs. B). But sometimes, you need a referee (C) to judge the comparison. The authors created a new "Cross-Divergence" measure that uses this third party to bound the difference between the first two. It's like saying, "The distance between A and B is limited by how they both relate to C."

Summary in Plain English

The Core Rule: There is a strict mathematical limit on how much "surprise" and "difference" can exist between two probability distributions.
The Special Case: This limit is only reached when one distribution is a mathematically "scaled" version of the other (an Escort density).
The Method: The authors built a "magic mirror" system (measure-preserving transformations) that lets them take this one core rule and instantly generate dozens of new, precise rules for other complex measurements (like how "spiky" a distribution is or how "heavy" its tails are).
The Result: They have created a powerful, unified framework that makes it much easier to find the absolute best possible limits (bounds) for how information behaves in complex systems, from physics to cybersecurity.

In short, they found the master key that unlocks a whole new set of precise mathematical locks in the field of information theory.

Here is a detailed technical summary of the paper "Entropies, cross-entropies and Rényi divergence: sharp three-term inequalities for probability density functions" by Iagar and Puertas-Centeno.

1. Problem Statement

The paper addresses the fundamental relationships between informational functionals in information theory, specifically focusing on the Rényi entropy, Rényi divergence, and Rényi cross-entropy. While a well-known additive identity exists for the Shannon case ( $S[f] + D[f||g] = H[f;g]$ ), the authors seek to establish a sharp inequality relating the one-parameter families of Rényi functionals.

Furthermore, the paper aims to extend these relationships to more complex informational functionals, including Fisher information measures, absolute moments, and their "cross-type" counterparts (involving multiple densities). The goal is to derive sharp bounds for the Rényi divergence using quotients of these diverse functionals, identifying the precise conditions (equality cases) under which these bounds are tight.

2. Methodology

The authors employ a two-pronged methodological approach:

A. The Core Inequality (Jensen's Inequality)

The foundation of the work is a new sharp inequality derived directly from Jensen's inequality. The authors establish a relationship between three Rényi functionals ( $R_\alpha$ , $D_\beta$ , $H_\gamma$ ) associated with three probability density functions (PDFs) $f, g, h$ .

Constraint: The parameters $\alpha, \beta, \gamma$ must satisfy the algebraic relation:
$(\alpha - \beta)(\alpha - \gamma) = (\alpha - 1)^2$
Result: Under this constraint, the sum of the Rényi entropy and Rényi divergence is bounded by the Rényi cross-entropy (or vice versa, depending on the ordering of $\alpha$ and $\beta$ ).
Equality Condition: The equality holds if and only if $g$ is an escort density of $f$ (a specific power-law transformation of $f$ ).

B. The General Framework of Reciprocal Transformations

To extend the core inequality to other functionals (Fisher, moments), the authors introduce a general framework involving measure-preserving transformations.

Transformation Pair: They define a transformation $\mathcal{O}$ acting on a PDF $f$ to produce $\tilde{f}$ , and a reciprocal transformation $\tilde{\mathcal{O}}$ acting on a second PDF $g$ .
Key Property (Proposition 3.1): Crucially, this pair of transformations preserves the Rényi divergence:
$D_\gamma[\mathcal{O}[f] || \tilde{\mathcal{O}}[g]] = D_\gamma[f || g]$
Strategy: By applying the core inequality (from Step A) to the transformed densities, and utilizing the invariance of the divergence term, the authors "transport" the inequality to new functionals (entropy, moments, Fisher information) that appear in the transformed space.

3. Key Contributions and Results

The paper derives several new sharp inequalities by applying the general framework to specific transformations:

A. Differential-Escort Transformation

Transformation: Uses the differential-escort transformation $E_\xi$ .
Result: Derives an inequality linking the Rényi entropy, Rényi divergence, and a generalized cross-entropy functional $H_{\gamma, \xi}$ .
Significance: This connects the standard divergence to functionals involving power-law weights of the densities.

B. Relative Differential-Escort Transformation & Cross-Divergence

New Functional: Introduces the Cross-Divergence $\tilde{H}_{a,b}[f; g || h]$ , a functional depending on three densities that interpolates between cross-entropy and divergence properties.
Result: Establishes an inequality bounding the difference between two Rényi divergences ( $D_\beta[f||g]$ and $D_{1+(\alpha-1)\xi}[f||h]$ ) by the Cross-Divergence.
Significance: This provides a sharp bound for the difference of divergences, a quantity often difficult to estimate.

C. Biparametric Down Transformation (Fisher Information)

Transformation: Uses the "down" transformation involving derivatives of the PDF ( $f'$ ).
New Functionals: Introduces Generalized Cross-Fisher Information and Down-Fisher measures.
Result: Derives inequalities where the Rényi divergence is bounded by quotients of Fisher-type information measures (e.g., $\phi^{(cr)} / \phi$ ).
Equality Cases: The equality is reached for specific distributions (e.g., if $f$ is Gaussian, $g$ is Rayleigh or Generalized Gamma; if $f$ is exponential, $g$ is exponential).

D. Up Transformation (Moments)

Transformation: Uses the "up" transformation (inverse of the down transformation), which maps densities to moment-like structures.
New Functionals: Introduces Cross-Deviation and Cross-Upper-Moments.
Result: Establishes sharp inequalities bounding the Rényi divergence by quotients of cross-deviations and standard deviations (moments).
Iterability: The authors note that the "up" transformation can be iterated $n$ times to generate inequalities for higher-order moments, whereas the "down" transformation requires increasingly strict regularity conditions.

4. Significance and Impact

Unification of Informational Inequalities: The paper provides a unified framework showing that diverse inequalities (involving entropy, divergence, Fisher information, and moments) are all consequences of a single fundamental inequality combined with measure-preserving transformations.
Sharpness and Equality Conditions: Unlike many loose bounds in information theory, these inequalities are sharp. The authors explicitly derive the exact functional forms (escort densities, power laws, specific distribution families) required for equality, which is crucial for applications in statistical physics and optimization.
Non-Extensive Statistical Physics: The results are directly relevant to the non-extensive formalism (Tsallis statistics), where escort densities play a central role. The paper clarifies the structural relationships between entropies and divergences in this context.
Generalizability: The "reciprocal transformation" framework is presented as a general tool. The authors demonstrate its application to specific cases but leave the door open for readers to apply it to other transformations to discover further sharp bounds.
Three-Density Structure: The introduction of functionals depending on three densities (Cross-Divergence, Cross-Fisher) extends the traditional two-density analysis, allowing for more nuanced comparisons and bounds in complex systems.

In summary, the paper establishes a rigorous mathematical bridge between classical information theory and advanced functional inequalities, offering a powerful toolkit for deriving tight bounds on the Rényi divergence using a wide variety of informational measures.

Entropies, cross-entropies and Rényi divergence: sharp three-term inequalities for probability density functions