Complete Diagrammatic Axiomatisations of Relative Entropy

This paper presents complete diagrammatic axiomatisations for Kullback-Leibler and Rényi divergences by characterizing them as quantitative enrichments of stochastic matrix categories under both Kronecker product and direct sum monoidal structures within a graphical string diagram framework.

Ralph Sarkis, Fabio Zanasi

Published 2026-03-06
📖 6 min read🧠 Deep dive

Imagine you are a chef trying to perfect a recipe. You have two versions of a soup: Soup A (your original recipe) and Soup B (your new experiment).

In the old days of computer science, we only asked a simple question: "Are these two soups identical?" If they tasted even slightly different, the answer was just "No." But in the world of machine learning and statistics, "No" isn't very helpful. We need to know how different they are. Is Soup B just a little too salty? Or is it completely inedible?

This paper is about creating a universal ruler to measure exactly how "different" two probability recipes are. This ruler is called Relative Entropy (or KL Divergence).

Here is the breakdown of what the authors did, using simple analogies.

1. The Problem: Measuring the "Distance" Between Recipes

In math, we often deal with probability distributions. Think of these as recipes for randomness.

  • Recipe A: A coin that lands Heads 50% of the time.
  • Recipe B: A coin that lands Heads 90% of the time.

How "far apart" are these two coins?

  • Total Variation Distance (an old ruler) says: "They are 0.4 units apart." It's like measuring the difference in the amount of salt.
  • Relative Entropy (the new ruler) says: "They are very far apart because if you bet on Heads, Recipe B will make you lose money much faster than Recipe A." It measures the surprise or the information loss when you use the wrong recipe.

The problem is, while we knew how to measure this distance for simple lists of numbers, we didn't have a complete set of rules (axioms) to calculate it for complex systems made of many interacting parts.

2. The Solution: String Diagrams (The LEGO Approach)

The authors use a visual language called String Diagrams. Imagine these as LEGO instructions or flowcharts.

  • Instead of writing long algebraic equations, you draw boxes and lines.
  • A line represents a piece of data flowing through a system.
  • A box represents a process (like a coin flip or a filter).

The authors built a new set of LEGO rules (axioms) that tell you how to combine these diagrams while keeping track of the "distance" (Relative Entropy) between them.

3. The Two Ways to Build: The "Kronecker" vs. The "Direct Sum"

The paper identifies two natural ways to combine these probability recipes, and they needed a specific ruler for each.

A. The "Kronecker" Way (The Multi-Player Game)

Imagine you have two independent games: a coin flip and a dice roll.

  • The Setup: You play both games at the same time.
  • The Analogy: This is like building a massive, multi-dimensional LEGO castle where every brick depends on every other brick in a grid.
  • The Rule: The authors created a rule called Chain⊗. It says: "To measure the difference between two giant castles, you measure the difference between the individual bricks and the rules that hold them together, then add them up."
  • Why it matters: This is crucial for Bayesian Networks and Causal Reasoning (figuring out cause-and-effect in complex systems like weather patterns or medical diagnoses).

B. The "Direct Sum" Way (The Buffet)

Imagine you have a buffet with two separate sections: a Salad Bar and a Dessert Bar.

  • The Setup: You pick either a salad or a dessert, but not both at once.
  • The Analogy: This is like having two separate LEGO sets sitting side-by-side on a table. They don't touch.
  • The Rule: They created a rule called Chain⊕. It says: "To measure the difference between two buffets, you look at the difference in the salad section and the dessert section, weighted by how likely people are to pick one over the other."
  • Why it matters: This is crucial for Convex Sets and understanding how randomness works as a "monadic effect" (a fancy way of saying how computers handle random choices).

4. The Secret Sauce: The "Chain Rule"

The most important part of this paper is how they handle the Chain Rule.

In the real world, complex systems are built from smaller parts.

  • The Old Way: You had to calculate the distance of the whole giant system from scratch.
  • The New Way (The Paper's Magic): The authors proved that you can break the system down.
    • Analogy: If you want to know how different two complex recipes are, you don't need to taste the whole soup. You just need to taste the base broth and then taste the spices added later.
    • The Math: They showed that the total difference = (Difference in the base) + (Weighted difference in the spices).

They turned this intuition into a formal inference rule. In their language, it looks like a logical "If/Then" statement:

IF the distance between the base ingredients is small, AND the distance between the spices is small,
THEN the distance between the final soup is guaranteed to be small (and we can calculate exactly how small).

5. Why Should You Care?

This might sound like abstract math, but it has huge real-world applications:

  1. AI and Machine Learning: When AI models learn, they are constantly trying to minimize the "distance" between their predictions and reality. This paper gives AI a better, more rigorous way to measure how well they are learning.
  2. Privacy: In "Differential Privacy" (protecting user data), we need to know exactly how much information is leaked. This ruler helps calculate that leakage precisely.
  3. Verification: It allows computer scientists to prove that two complex probabilistic programs are "close enough" to be considered equivalent, without having to run them a billion times.

Summary

The authors took a fundamental concept in statistics (Relative Entropy) and built a complete, visual rulebook for it.

  • They used String Diagrams (visual LEGO blocks) to represent complex systems.
  • They created two specific rulebooks for two different ways systems interact (simultaneous vs. alternative).
  • They introduced Chain Rules that let us break down complex problems into simple, manageable pieces.

In short, they gave us a universal translator that turns the messy, hard-to-calculate differences between complex probabilistic systems into a clean, logical set of steps anyone (or any computer) can follow.