Induced Numerical Instability: Hidden Costs in Multimodal Large Language Models

This paper introduces a novel attack method that induces numerical instability in multimodal large language models by optimizing a specific loss function to generate images, causing significant performance degradation across state-of-the-art models and datasets that is distinct from traditional adversarial perturbations.

Wai Tuck Wong, Jun Sun, Arunesh Sinha

Published 2026-03-06
📖 5 min read🧠 Deep dive

Here is an explanation of the paper "Induced Numerical Instability: Hidden Costs in Multimodal Large Language Models" using simple language and creative analogies.

The Big Idea: The "Whispering Ghost" Attack

Imagine you have a super-smart robot assistant (a Multimodal Large Language Model, or LVLM) that can look at a picture and answer questions about it. You might think the only way to trick this robot is to show it a picture of a cat that looks like a dog, or to draw a weird scribble on the photo that confuses its eyes.

This paper discovers a brand new way to break the robot. Instead of changing what the robot sees, the researchers change how the robot thinks.

They found that by making tiny, invisible tweaks to the numbers inside the computer's brain, they can cause the robot to hallucinate wildly, giving answers that make no sense, even though the picture looks exactly the same to a human.


The Analogy: The "Fuzzy Calculator"

To understand this, imagine the robot's brain is a massive team of accountants doing math.

  1. The Short-Cut (Half-Precision): To save time and memory, these accountants don't use infinite precision. They round off numbers. Instead of saying "3.14159265...", they say "3.14". This is called Half-Precision. It's like using a ruler with only big markings instead of tiny millimeter lines. Usually, this is fine.
  2. The Rounding Error: Sometimes, when you add up thousands of these rounded numbers, the tiny errors stack up. It's like if you round every step of a long journey; by the end, you might be miles off course.
  3. The Attack: The researchers realized that if they could nudge the starting numbers just the right way, they could force the accountants to make the worst possible rounding mistakes. They aren't changing the image; they are changing the math behind the image.

The "Domino Effect"

The paper describes two levels of this instability:

  • Level 1: The Ruler (Implementation Level): This is the rounding error mentioned above. It's like using a slightly bent ruler.
  • Level 2: The Amplifier (Functional Level): This is where it gets scary. The robot's brain is designed so that small changes can get blown up into huge changes.
    • Analogy: Imagine a microphone that is slightly too sensitive. If you whisper a tiny "hello" into it, it might feedback and scream. The researchers found a way to whisper a specific "hello" (a tiny pixel change) that causes the robot to scream nonsense.

What Happened in the Experiments?

The researchers tested this on several famous AI models (like LLaVA and Idefics) using standard datasets (like Flickr30k and VQAv2).

The Results were shocking:

  • The Input: They took a picture of a girl sunbathing with a purple towel. To a human, the "attacked" picture looked identical to the original.
  • The Output (Clean): The AI correctly said, "A woman wearing a purple scarf lays on a wooden surface."
  • The Output (Attacked): The AI looked at the same picture and said, "The purple shirt man is fighting with the other man."

They did this with many questions:

  • Question: "What town is this?" -> Answer: "Burnaby."
  • Attacked Answer: "Newark." (Completely wrong city).
  • Question: "What is on the plate?" -> Answer: "Cake."
  • Attacked Answer: "A steak with veggies."

Why is this different from "Adversarial Attacks"?

Usually, when we talk about "hacking" AI, we think of Adversarial Attacks.

  • Adversarial Attack: Like putting a sticker on a stop sign that makes a self-driving car think it's a speed limit sign. You are changing the visual pattern to trick the AI's pattern recognition.
  • Numerical Instability (This Paper): Like whispering a specific frequency into a speaker that makes the amplifier blow a fuse. You aren't changing the picture's pattern; you are exploiting the math engine inside the computer. The AI isn't "confused" by the image; its internal math is just breaking down.

The "Hidden Cost"

The paper calls this a "Hidden Cost" because:

  1. It's Invisible: You can't see the attack. The image looks perfect.
  2. It's Universal: It works on different models, different sizes, and different tasks.
  3. It's Hard to Fix: You can't just "train" the AI to ignore it easily, because the problem isn't in the training data; it's in the fundamental way computers do math (floating-point arithmetic).

The Takeaway

This research is a wake-up call. We are building AI systems that are incredibly powerful, but they are running on a foundation of "fuzzy math" (half-precision) to save money and speed.

The authors are saying: "We found a way to make these powerful robots hallucinate just by tweaking the math, not the picture. This is a new kind of weakness we need to understand and fix before we let these robots drive cars or manage hospitals."

It's like discovering that a super-strong bridge doesn't collapse because of heavy trucks, but because of a specific, tiny vibration that makes the steel vibrate until it snaps.