On the definition and importance of interpretability in… — Plain-Language Explanation

✨

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

The Big Problem: The "Black Box" vs. The "Recipe Book"

Imagine you are a scientist trying to understand how the universe works. For centuries, scientists have been like master chefs writing down recipes. These recipes are short, simple lists of ingredients and steps (mathematical equations) that explain why a cake rises or why a bridge holds. If you read the recipe, you understand why the cake rises.

Now, imagine a new kind of chef: a Super-Computer AI. This AI has tasted millions of cakes and can predict exactly how a new cake will taste with 100% accuracy. But, the AI doesn't give you a recipe. Instead, it gives you a giant, tangled ball of yarn (a neural network) with millions of knots.

If you ask the AI, "Why did the cake rise?" it can't answer. It just says, "Because the yarn is tied this way."

This is the problem the paper addresses. In Scientific Machine Learning (SciML), we have these powerful "yarn ball" models that predict physical phenomena perfectly. But scientists are uneasy because they can't read the "recipe." They can't integrate these findings into their existing knowledge base. They want to know the mechanism, not just the prediction.

The Misunderstanding: "Simple" vs. "Understandable"

The paper argues that scientists in this field have been confused about what "interpretability" actually means.

The Current Mistake:
Many researchers think that if a model's output is mathematically simple (short and sparse), it is automatically understandable.

The Analogy: Imagine you find a note that says: x + y = z.
The Scientist's Thought: "Wow, that's short! It only has three letters. It must be easy to understand!"
The Reality: What do x, y, and z actually mean? Are they numbers? Are they emotions? Are they forces? Without knowing the context, a short equation is just as confusing as a long one.

The paper calls this the "Sparsity Trap." Just because an equation is short (sparse) doesn't mean it explains the physics behind the phenomenon.

The Real Definition: The "Story" Behind the Math

The authors propose a new, better definition of interpretability. They say:

A model is only "interpretable" if you can tell the story of how it connects to the fundamental laws of nature.

It's not about how short the sentence is; it's about whether the sentence makes sense in the context of the world.

The Analogy of the "Foreign Language":
Imagine you find a sentence written in a language you don't speak.

Scenario A (Sparse but Uninterpretable): The sentence is very short: "Gloop." It's short, but you have no idea what it means. It tells you nothing about the world.
Scenario B (Complex but Interpretable): The sentence is long and complicated: "The gravitational pull of the sun, combined with the inertia of the planet, creates an elliptical orbit." Even though it's long and complex, you understand the story. You know exactly what is happening.

The paper argues that Interpretability = Connection to Known Mechanisms.

If a model says, "This happens because of Advection (wind moving stuff) and Diffusion (spreading out)," that is interpretable because we know what those words mean.
If a model says, "This happens because of Term X," and we have no idea what "Term X" is, it is not interpretable, even if "Term X" is a very short mathematical term.

The Kepler Example: A Historical Lesson

The paper uses a famous historical example to prove their point: Johannes Kepler.

In the 1600s, Kepler discovered that planets move in ellipses. He wrote down simple, short mathematical laws to describe this.

Were they "Sparse"? Yes, very.
Were they "Interpretable" at the time? No.
Kepler didn't know why the planets moved that way. He just knew that they did. It wasn't until 70 years later that Isaac Newton came along and said, "Ah! These laws are actually the result of Gravity and Inertia."

Only after Newton connected Kepler's short laws to the deeper "mechanism" of gravity did the laws become truly interpretable. Before that, they were just mysterious, short patterns.

The Solution: A New Way to Think

The authors suggest that in Scientific Machine Learning, we need to stop obsessing over making equations short (sparse) and start obsessing over making them connected to physical truth.

Don't just look for short equations. A short equation with a mysterious term is useless for discovery.
Look for the "Story." Can you explain the equation using known physics (like conservation of energy, force, or mass)?
The Role of AI: The AI is great at finding patterns. But for it to be truly useful for science, we need to use those patterns to find the "missing link" to the fundamental laws of nature.

The Bottom Line

Think of Interpretability not as a "short summary," but as a bridge.

Old View: The bridge needs to be short and simple (Sparse).
New View: The bridge needs to connect two specific places: The Data we collected and the Fundamental Laws of Physics we already know.

If the AI gives us a short bridge that leads nowhere, it's not helpful. If it gives us a long, winding path that leads us to a deep understanding of how the universe works, that is true Interpretability.

In short: Scientists don't just want a model that predicts the future; they want a model that tells them why the future happens, using the language of physics, not the language of a black box.

1. Problem Statement

The paper addresses a critical gap in Scientific Machine Learning (SciML): the lack of a rigorous, philosophically grounded definition of interpretability.

The Conflict: While traditional scientific models rely on compact mathematical expressions encoding physical laws, modern ML models (particularly deep neural networks) are "black boxes." Scientists require not just predictive accuracy but also an understanding of the underlying mechanisms ("Why?").
The Misconception: In the fields of Equation Discovery (e.g., SINDy) and Symbolic Regression, researchers frequently conflate interpretability with mathematical sparsity (i.e., the idea that a model with fewer terms or a simpler algebraic form is inherently interpretable).
The Core Question: The authors argue that sparsity alone does not guarantee that a model's findings can be integrated into the existing body of scientific knowledge. They seek to define interpretability in a way that distinguishes between mere mathematical simplicity and genuine scientific understanding.

2. Methodology

The authors employ a conceptual and philosophical analysis rather than empirical experimentation. Their methodology involves:

Literature Review: A systematic survey of the equation discovery and symbolic regression literature (e.g., SINDy, sparse identification) to identify how "interpretability" is currently defined and used.
Cross-Domain Analysis: Reviewing the broader field of Interpretable Machine Learning (Interpretable ML) and Explainable AI (XAI) to determine if existing definitions (focused on transparency, safety, and debugging) apply to the physical sciences.
Thought Experiments: Constructing hypothetical scenarios to test the validity of the "sparsity = interpretability" hypothesis. These include:
- A sparse polynomial fit for a cantilever beam that predicts physically impossible behavior (negative displacement under high load).
- A sparse but physically unfamiliar term in an advection-diffusion-reaction equation.
- A complex, non-sparse hyperelastic strain energy density equation that is physically meaningful.
Philosophical Framework: Drawing on the history and philosophy of science (referencing Kitcher, Kuhn, Quine, and Hempel) to analyze the nature of scientific explanation, reductionism, and the role of prior knowledge.

3. Key Contributions

The paper makes five primary contributions:

Critique of the Sparsity Definition: The authors demonstrate that the scientific community largely equates interpretability with sparsity. They show via literature review (summarized in Table 1) that this is an implicit, often unjustified consensus.
Inadequacy of General ML Definitions: They argue that standard ML definitions of interpretability (focusing on input-output transparency or post-hoc explanations) are insufficient for SciML. Scientists need to understand mechanisms, not just correlations.
Demonstration of Sparsity's Failure: Through thought experiments, the authors prove that:
- Sparsity $\nRightarrow$ Interpretability: A sparse equation can contain terms with no physical meaning (e.g., an unknown source term in a PDE) or lead to physically nonsensical extrapolations.
- Complexity $\nRightarrow$ Uninterpretability: A mathematically complex equation (e.g., a hyperelastic strain energy density) can be fully interpretable if it connects to known physical principles (energy storage).
Proposed Operational Definition: The authors propose a new definition of interpretability specific to SciML:

"In SciML, a learned model is interpretable when it can either be derived from basic physical principles or it represents an empirical component of a model derived from basic physical principles."
- Key Nuance: This definition relies on prior knowledge. An equation is interpretable only if its terms can be mapped to known mechanisms (e.g., advection, diffusion) or established empirical relations (e.g., constitutive laws).
Re-evaluation of Sparsity's Role: The paper redefines the utility of sparsity. Sparsity is not the definition of interpretability but a heuristic that facilitates future derivation. A sparse equation is more likely to be derivable from fundamental principles than a complex neural network, thus keeping the "door open" for future interpretation.

4. Key Results and Arguments

The Role of Prior Knowledge: Interpretation is not an intrinsic property of a mathematical expression; it is a relationship between the expression and the scientist's existing knowledge base. Without prior knowledge of the underlying mechanisms, even the sparsest equation is uninterpretable.
The "Kepler" Example: The authors use Kepler's laws of planetary motion as a historical case study. Kepler's laws were mathematically sparse and accurate but were not interpretable at the time because the mechanism (gravity) was unknown. They only became interpretable when Newton derived them from fundamental principles. This proves that sparsity does not equal interpretability.
The "Unfamiliar Term" Problem: In equation discovery, if a method recovers a sparse term that does not match any known physical mechanism, it cannot be interpreted. The authors argue that if interpretation requires distinguishing "meaningful" from "meaningless" terms a priori, true scientific discovery (finding the unknown) becomes impossible under the current definition.
Unification as the Goal: The ultimate goal of interpretability in science is unification (Kitcher's concept). Interpretability allows disparate phenomena to be decoded into a common "latent space" of fundamental physical principles (e.g., conservation of mass, momentum, energy).

5. Significance and Implications

Refocusing Research: The paper argues that the SciML community should stop treating sparsity as the end goal. Instead, research should focus on methods that facilitate the connection between data-driven models and fundamental physical principles.
Limits of Discovery: The proposed definition implies that revolutionary discoveries (the discovery of entirely new fundamental laws) are inherently uninterpretable at the moment of discovery because they lack a prior framework for derivation. Interpretability is a feature of "normal science" (solving problems within existing frameworks), not necessarily paradigm shifts.
Practical Guidance: For practitioners, this means that a "sparse" model is only useful if the researcher can map its terms to known physics. If the terms are unknown, the model is a predictive tool but not a scientific discovery in the traditional sense.
Philosophical Rigor: By grounding the definition in the philosophy of science, the paper provides a robust framework for evaluating claims about "interpretable AI" in scientific contexts, moving beyond marketing buzzwords like "transparency" or "sparsity."

In conclusion, Rowan and Doostan argue that interpretability is about mechanism, not mathematics. A model is interpretable only if it can be linked to the "latent space" of fundamental physical laws, and sparsity is merely a tool that increases the probability of such a link being established, not a guarantee of it.

On the definition and importance of interpretability in scientific machine learning