CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework

The paper introduces CARE, an evidence-grounded agentic framework that enhances clinical accountability and reasoning accuracy in multi-modal medical AI by decomposing tasks into specialized modules for entity proposal, pixel-level localization, and evidence-based reasoning, thereby outperforming state-of-the-art models on medical VQA benchmarks.

Yuexi Du, Jinglu Wang, Shujie Liu, Nicha C. Dvornek, Yan Lu

Published 2026-03-12
📖 5 min read🧠 Deep dive

Imagine you are a patient sitting in a doctor's office. You have an X-ray or a CT scan, and you ask, "What's wrong with me?"

In the world of Artificial Intelligence, most current "medical AI" models act like a brilliant but overconfident student who glances at your scan for a split second and immediately blurts out an answer. They might get lucky, but often they are just guessing based on patterns they've seen before, without actually looking at the specific spot on your image. If they get it wrong, they can't explain why, and they might even invent symptoms that aren't there (a problem called "hallucination"). This is risky because, in medicine, you need to know why a doctor made a diagnosis, not just what the diagnosis is.

The paper introduces CARE (Clinical Accountability in multi-modal medical Reasoning with an Evidence-grounded agentic framework). Think of CARE not as a single student, but as a highly organized medical team working together to solve your case.

Here is how CARE works, using a simple analogy:

The Problem: The "Black Box" Doctor

Current AI models are like a Black Box Doctor. You hand them a photo, and they give you an answer. You have no idea if they looked at the right spot, or if they just guessed based on the color of the photo. If they say, "You have pneumonia," you don't know if they actually saw the pneumonia or just saw a dark spot and assumed.

The Solution: The CARE Team

CARE breaks the job down into three specialized roles, mimicking how a real human doctor thinks:

1. The Triage Nurse (Medical Entity Proposal)

Instead of guessing the whole disease immediately, the first AI (the "Triage Nurse") looks at your question and the image and says: "Okay, the patient is asking about their lungs. I should focus on the left and right lungs, not the heart or the bones."

  • What it does: It identifies the specific body parts or features relevant to the question.
  • Why it helps: It stops the AI from wasting time looking at the wrong things.

2. The Specialist Technician (Entity Referring Segmentation)

Once the Nurse says, "Look at the left lung," the second AI (the "Technician") steps in. This is an expert at drawing precise outlines. It doesn't just guess; it draws a pixel-perfect mask around the suspicious area in the lung.

  • What it does: It creates a "highlighter" effect, isolating the exact spot of interest.
  • Why it helps: It provides hard evidence. It's like the doctor putting a magnifying glass over the specific spot and saying, "Here is the problem area."

3. The Senior Diagnostician (Evidence-Grounded VQA)

Now, the third AI (the "Senior Doctor") gets the full picture. They see the original image, but they also see the "highlighted" area created by the Technician. They reason through the problem: "I see the highlighted area in the left lung. It looks dense and white. Based on my training, this looks like pneumonia."

  • What it does: It makes the final diagnosis, but it must base it on the evidence provided by the previous steps.
  • Why it helps: It prevents the AI from making up facts. If the evidence doesn't support the answer, the system is designed to catch it.

The Manager: The Coordinator

To make sure this team works perfectly, CARE has a Manager (called the "Coordinator").

  • The Job: The Manager decides which tools to use. Do we need to zoom in? Do we need to draw a mask? Or is the whole image enough?
  • The Safety Net: The Manager also acts as a Quality Control Inspector. After the Senior Doctor gives an answer, the Manager reviews the logic: "Wait, you said it's pneumonia, but your reasoning said the area is clear. That doesn't make sense. Let's re-check."
  • The Result: If the Manager catches a mistake, they fix it before giving the final answer to you.

Why is this a big deal?

  1. No More Guessing: By forcing the AI to "point" to the evidence before answering, it stops the AI from hallucinating (making things up).
  2. Transparency: You can see the "thought process." You can see exactly which part of the image the AI looked at to make its decision. This is what doctors call "accountability."
  3. Better Performance: The paper shows that this team approach is actually smarter and more accurate than the biggest, most expensive single AI models, even though the CARE team is smaller and uses less computing power.

The Bottom Line

Imagine if your AI doctor didn't just give you a verdict, but instead walked you through the exam room, pointed to the X-ray, said, "See this white spot here? That's what I'm looking at. Based on that, here is my diagnosis."

CARE is that kind of AI. It turns medical diagnosis from a "magic trick" into a transparent, evidence-based process, making it safer and more trustworthy for real-world healthcare.