The limits of interpretability in multiple linear regression

This paper demonstrates that multicollinearity in multiple linear regression undermines physical interpretability by amplifying weight fluctuations and generating oscillatory patterns across correlated features, a phenomenon that can be mitigated but not fully resolved by Ridge regularization.

Original authors: Anand Sharma, Chen Liu, Daniele Coslovich, Misaki Ozawa

Published 2026-06-16
📖 6 min read🧠 Deep dive

Original authors: Anand Sharma, Chen Liu, Daniele Coslovich, Misaki Ozawa

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

The Big Picture: Why "Simple" Math Can Be Tricky

Imagine you are a detective trying to solve a mystery: What makes a material become a superconductor (a material with zero electrical resistance)?

You have a list of clues (features) like the weight of the atoms, their size, how much energy it takes to pull an electron off, etc. You want to use a simple tool called Multiple Linear Regression to figure out which clues are the most important.

Usually, people think linear regression is the "easy mode" of machine learning. It's like a simple recipe:

Prediction = (Weight of Clue A × Importance Score A) + (Weight of Clue B × Importance Score B) + ...

If the "Importance Score" (the weight) for "Atomic Size" is huge and positive, you think, "Aha! Big atoms are the key!" If it's negative, you think, "Small atoms are the key!"

The paper argues that this simple logic often breaks down. Even though the math is simple, if your clues are too similar to each other, the "Importance Scores" become chaotic, unstable, and impossible to trust.


The Problem: The "Twin" Clues (Multicollinearity)

The main villain in this story is Multicollinearity. This happens when two or more of your clues are so similar that they are practically twins.

The Analogy: The Twin Brothers
Imagine you are trying to guess a person's height. You have two clues:

  1. Clue A: The person's height in centimeters.
  2. Clue B: The person's height in inches.

These two clues are perfectly correlated. If you know one, you know the other. They are "twins."

Now, imagine you try to build a model to guess height using these two clues. The math gets confused. It asks: "How much of the height is caused by the centimeters, and how much by the inches?"

Because they are twins, the math can't decide.

  • In one experiment, it might say: "Centimeters are super important (+100), and inches are super unimportant (-100)."
  • In the next experiment (using slightly different data), it might flip: "Centimeters are -100, and inches are +100."

The total prediction (the sum) stays accurate, but the individual scores swing wildly like a pendulum. This is what the paper calls Weight Fluctuations.

The Specific Issues Found in the Paper

The authors looked at real physics data (superconductors and glassy liquids) and found two specific nightmares:

1. The "Jittery" Weights (Dataset-to-Dataset Fluctuations)

If you train your model on one batch of data, you get a set of scores. If you train it on a different batch of data (even if it's from the same physics experiment), the scores change completely.

  • The Metaphor: Imagine trying to weigh a feather on a scale that is slightly wobbly. If you put the feather down, the scale says "5 grams." You take it off, put it back, and it says "-3 grams." The feather didn't change, but the measurement is unstable.
  • The Result: You cannot trust the numbers to tell you what is physically important because they change every time you look.

2. The "Oscillating" Weights (The See-Saw Effect)

This is the weirdest part. The paper found that when you have a list of clues that are ordered (like "Atomic Size at scale 1," "Atomic Size at scale 2," "Atomic Size at scale 3"), the scores don't just jitter; they oscillate.

  • The Metaphor: Imagine a row of dominoes. If you push the first one, the second one goes up, the third goes down, the fourth goes up, and so on.
  • The Reality: In the data, "Atomic Size at scale 1" might get a huge positive score. "Atomic Size at scale 2" (which is almost the same thing) gets a huge negative score. "Scale 3" goes back to positive.
  • Why it matters: This makes no physical sense. If two things are physically similar, they should have similar scores. Instead, the math forces them to cancel each other out in a chaotic dance.

Why Does This Happen? (The Hidden Mechanism)

The authors used a mathematical tool called Eigenmode Decomposition to explain this. Think of this as looking at the "vibrations" of your data.

  • The Analogy: Imagine a guitar string. It has a main vibration (the note you hear) and some tiny, high-frequency vibrations (overtones).
  • The Math: When your clues are too similar (multicollinearity), the "guitar string" of your data has some very weak, shaky vibrations (small eigenvalues).
  • The Crash: The linear regression math tries to amplify these weak vibrations to make the prediction work. But because they are so weak, any tiny bit of noise in the data gets amplified into a massive, wild swing in the scores. These weak vibrations are the ones causing the "see-saw" oscillation.

The "Fix": Ridge Regularization

The paper tests a common fix called Ridge Regression.

  • The Analogy: Imagine the wobbly scale again. Ridge Regression is like adding a heavy, stiff spring to the scale. It doesn't let the needle swing wildly. It forces the needle to stay closer to zero unless the evidence is overwhelming.
  • The Result: This "spring" (mathematical penalty) stops the wild oscillations and stabilizes the scores. The scores become much calmer and more consistent.

However, there is a catch:
The paper warns that even with this fix, you still can't just pick a number and say, "This is the truth."

  • If you make the spring too stiff, you crush all the clues to zero (you lose information).
  • If you make it too loose, the jitter comes back.
  • Crucially: The paper shows that you can get the same accurate prediction with many different settings of the spring, but the explanation (the weights) will look completely different for each setting.

The Bottom Line

  1. Linear Regression isn't always simple: Just because the formula looks simple doesn't mean the results are easy to understand.
  2. Correlation is dangerous: If your clues are too similar, the math gets confused and produces unstable, oscillating answers that look like noise.
  3. Prediction \neq Understanding: You can get a model that predicts the future perfectly, but the "reasons" it gives (the weights) might be physically meaningless because of this instability.
  4. The Solution isn't a magic button: Adding a mathematical fix (Ridge) helps, but it doesn't solve the root problem. To truly understand the physics, you likely need to do Feature Selection—which means manually picking the best, most unique clues and throwing away the "twins" before you even start the math.

In short: Don't trust the numbers blindly. If your data has too many similar clues, the "Importance Scores" are likely just a reflection of mathematical confusion, not physical reality.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →