PSF-Med: Measuring and Explaining Paraphrase Sensitivity in Medical Vision Language Models

This paper introduces PSF-Med, a benchmark revealing significant paraphrase sensitivity in medical Vision Language Models, and demonstrates that causal intervention on a specific sparse feature identified via Sparse Autoencoders can substantially reduce these instability risks while maintaining accuracy.

Binesh Sadanandan, Vahid Behzadan

Published 2026-02-26
📖 4 min read☕ Coffee break read

Imagine you have a very smart, highly trained medical assistant named "Dr. AI." This assistant can look at X-ray images and answer questions like, "Is there a broken bone?" or "Is the heart enlarged?"

You would hope that Dr. AI is consistent. If you ask, "Is there a broken bone?" and get a "Yes," you'd expect that if you asked, "Do I have a fracture?" (which means the exact same thing), the answer would still be "Yes."

But this paper reveals a scary glitch: Dr. AI is easily confused by how you phrase your question.

Here is the breakdown of the research, explained simply with some analogies.

1. The Problem: The "Paraphrase Sensitivity" Glitch

The researchers built a giant test called PSF-Med. They took thousands of chest X-rays and asked the same medical questions in dozens of different ways.

  • Question A: "Is there a collapsed lung?"
  • Question B: "Does this X-ray show a pneumothorax?" (Same meaning, different words).

The Result: The AI models often gave contradictory answers.

  • To Question A, it said: "No."
  • To Question B, it said: "Yes."

This is like a weather forecaster saying, "It's going to rain," when you ask, "Will it be wet?" but saying, "It will be dry," when you ask, "Is there a chance of precipitation?" In a hospital, this inconsistency is dangerous. If two doctors ask the same question in different ways and get opposite answers, they can't trust the machine.

2. The Trap: "Low Flip Rates" Don't Mean "Good"

The researchers found something even more surprising. Some models were very consistent (they rarely changed their answer when you rephrased the question). You might think, "Great! That model is reliable!"

But wait. The researchers discovered that some of these "reliable" models weren't actually looking at the X-ray at all. They were just guessing based on the words.

  • The Analogy: Imagine a student taking a test who is so good at memorizing the questions that they know the answers without looking at the textbook. If you ask, "Is the sky blue?" they say "Yes." If you ask, "Is the atmosphere blue?" they still say "Yes." They are consistent, but they aren't actually seeing the sky.
  • The Finding: The most consistent models often ignored the image entirely and relied on "language shortcuts" (e.g., "If the question asks about a broken bone, the answer is usually 'Yes'").

3. The Detective Work: Finding the "Switch"

To understand why the AI was flipping its answers, the researchers used a special tool called a Sparse Autoencoder (SAE). Think of this as an X-ray for the AI's brain. It lets you see the tiny electrical switches (neurons) firing inside the model.

They found one specific switch, which they named Feature 3818.

  • What does this switch do? It acts like a tone detector.
    • When the question is formal and clinical (e.g., "Is there radiographic evidence of..."), the switch turns ON. The AI becomes conservative and says "No" (to be safe).
    • When the question is casual (e.g., "Can you see..."), the switch turns OFF. The AI becomes permissive and says "Yes."

The Glitch: The AI wasn't looking at the lung; it was just reacting to whether the doctor sounded like a professor or a friend.

4. The Fix: Flipping the Switch Back

Once they found this "tone detector" switch, they tried to fix it. They essentially told the AI: "Ignore this switch. Don't let the tone of the question change your answer."

  • The Result: By "clamping" (turning off) this specific switch, they reduced the number of contradictory answers by 31%.
  • The Trade-off: The model got slightly less accurate overall (by about 1%), but it became much more reliable. It stopped guessing based on word choice and started actually looking at the picture.

5. The Big Lesson

The paper concludes that we can't just measure if an AI is "accurate" or "consistent" in isolation.

  • Consistency is good, but only if the AI is actually looking at the image.
  • Accuracy is good, but only if the AI isn't just guessing based on how the question is phrased.

The Takeaway:
Before we let AI doctors into our hospitals, we need to test them not just on what they know, but on how they react to different ways of asking. We need to make sure they are looking at the X-ray, not just listening to the tone of our voice.

In short: The paper teaches us that a smart AI that changes its mind based on your vocabulary is dangerous, and a consistent AI that ignores the picture is useless. We need an AI that does both: looks at the image and stays calm, no matter how you ask.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →