Which Tool Response Should I Trust? Tool-Expertise-Aware Chest X-ray Agent with Multimodal Agentic Learning

This paper introduces TEA-CXA, a tool-expertise-aware chest X-ray agent that utilizes multimodal agentic learning to empirically resolve conflicts between error-prone medical tools by learning their reliability across different query types, while also extending reinforcement learning frameworks to support complex multimodal interactions.

Zheang Huai, Honglong Yang, Xiaomeng Li

Published 2026-02-26
📖 4 min read☕ Coffee break read

Imagine you are a doctor trying to diagnose a patient's chest X-ray. Instead of relying on just your own eyes, you have a team of specialist AI robots standing around you, each offering their own opinion.

  • Robot A says: "I see a mild heart enlargement."
  • Robot B says: "No, I see a severe heart enlargement."

They disagree. Who do you trust?

In the past, doctors (or AI agents) had two bad options:

  1. The "Blind Trust" approach: Just pick the robot that sounds the most confident or gives the longest, most detailed explanation. (But sometimes, the robot that talks the most is just the most confused!)
  2. The "Average" approach: Take the middle ground of all their answers. (But if one robot is right and the other is wrong, the middle ground is still wrong.)

The Problem: The "Resume" vs. The "Track Record"

Most current AI systems look at a robot's resume (its description: "I am an expert in heart diseases") to decide who to trust. But in the real world, a robot might have a great resume but still make mistakes on specific types of X-rays. They don't know which robot is actually reliable right now for this specific picture.

The Solution: TEA-CXA (The "Smart Intern")

The paper introduces a new AI agent called TEA-CXA. Think of TEA-CXA not as a doctor, but as a super-smart medical intern who learns by doing.

Here is how TEA-CXA learns, using a simple analogy:

1. The "Taste Test" Training

Imagine you are training a food critic. You give them a dish and ask them to guess the ingredients.

  • Old Way: You tell the critic, "Chef A is a French expert, so trust Chef A."
  • TEA-CXA Way: You let the critic try different chefs. Sometimes Chef A is right, sometimes Chef B is right.
    • If the critic guesses Chef A's answer and it's correct, they get a gold star (reward).
    • If they guess Chef B's answer and it's wrong, they get a thumbs down.

Over time, the critic stops looking at the chefs' resumes. Instead, they learn a track record: "Oh, for spicy dishes, Chef A is usually right. But for desserts, Chef B is the one to trust."

2. The "Conflict Resolution" Superpower

In the paper, when the two AI robots give different answers about an X-ray, TEA-CXA doesn't panic. It remembers its training:

  • "Hmm, this looks like a 'heart size' question. In my past training, Robot A was right 80% of the time on heart questions, even though Robot B wrote a longer explanation."
  • Decision: TEA-CXA ignores the long explanation and picks Robot A's answer.

3. The "Team Huddle" (Technical Magic)

The researchers also built a special "playground" (a code framework) to make this training possible.

  • Parallel Play: Usually, asking robots for help takes time. TEA-CXA asks multiple robots at the exact same time (like calling three friends at once instead of one by one).
  • Multi-Image: If the patient has two X-rays (front and side view), TEA-CXA knows exactly which robot to show which picture to, without getting confused by file names.

Why This Matters

The paper proves that this "learning by experience" approach works.

  • The Result: TEA-CXA became better at diagnosing X-rays than any single robot, better than just averaging their answers, and even better than the current "best" AI doctors in the world.
  • The Lesson: It's not about who says they are the expert; it's about who has proven to be the expert on this specific type of problem.

In a nutshell: TEA-CXA is an AI that stops guessing based on who talks the loudest and starts trusting who has the best track record for the specific job at hand. It turns a chaotic group of conflicting robots into a perfectly coordinated medical team.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →