AnatomiX, an Anatomy-Aware Grounded Multimodal Large Language Model for Chest X-Ray Interpretation

AnatomiX is a novel two-stage multimodal large language model that significantly enhances chest X-ray interpretation by integrating anatomical structure identification with downstream tasks, achieving over 25% performance improvements in anatomical reasoning and grounding compared to existing approaches.

Anees Ur Rehman Hashmi, Numan Saeed, Christoph Lippert

Published 2026-03-16
📖 4 min read☕ Coffee break read

Imagine you are trying to teach a very smart robot assistant how to read a chest X-ray. You want this robot to not only tell you what's wrong (like "there's pneumonia") but also to point exactly where it is on the picture and understand the difference between the left lung and the right lung.

The problem is, most current AI models are like students who memorized the answers but didn't understand the map. If you show them a normal X-ray, they get the diagnosis right. But if you flip the image upside down or swap the left and right sides, they get confused. They might say, "Oh, the pneumonia is on the left," when it's actually on the right, simply because they are guessing based on patterns rather than truly "seeing" the anatomy.

This paper introduces AnatomiX, a new AI model designed to fix this by acting more like a real radiologist.

Here is how AnatomiX works, broken down into simple analogies:

1. The Old Way: The "Guessing Game"

Most AI models look at an X-ray and try to guess the answer in one giant leap. They see a dark spot and say, "That's pneumonia." But they don't really know which lung that spot is in. It's like trying to find a specific house in a city by just looking at a blurry photo of the whole neighborhood. If the photo is flipped, you might point to the wrong house.

2. The AnatomiX Way: The "Two-Step Detective"

AnatomiX changes the game by breaking the job into two distinct steps, just like a human doctor does.

Step 1: The "Map Maker" (Anatomy Perception Module)
Before trying to diagnose anything, AnatomiX has a special internal tool called the Anatomy Perception Module (APM). Think of this as a GPS system that scans the X-ray first.

  • It doesn't just look at the whole picture; it specifically hunts for 36 different body parts (like the heart, the left lung, the right lung, the collarbones, etc.).
  • It draws invisible boxes around them and says, "Okay, I found the Left Lung here, and the Right Lung is over there."
  • It creates a detailed "map" of the body parts before it even tries to answer a question.

Step 2: The "Doctor" (The Large Language Model)
Once the "Map Maker" has identified and labeled all the body parts, it hands this organized information to the "Doctor" (the main AI brain).

  • The Doctor now doesn't have to guess where things are. It looks at the map and says, "Ah, I see the user asked about the Left Lung. The map tells me the Left Lung is here. Let me check what's happening in that specific box."
  • Because the Doctor has a clear map, it never gets confused if the image is flipped. It knows that "Left" is always "Left," regardless of how the picture is oriented.

3. The "Flashcard" Trick (Contrastive Retrieval)

One of the coolest parts of AnatomiX is how it learns what to say about each body part.

  • Imagine the AI has a massive library of flashcards. Each card has a picture of a specific body part (like the "Right Lower Lung") on one side and a medical description on the other (like "shows signs of fluid").
  • When AnatomiX sees a new X-ray, it finds the "Right Lower Lung" on its map, grabs the matching flashcard from its library, and uses that description to help the Doctor write the report. This ensures the AI uses the correct medical terms for the correct body part.

Why Does This Matter?

The researchers tested this new model against the best existing AI models. Here is what happened:

  • The "Flipped Image" Test: When they flipped the X-rays left-to-right, the old models failed miserably, mixing up left and right. AnatomiX, however, got it right almost every time because it actually understood the anatomy, not just the visual patterns.
  • The "Pointing" Test: When asked to draw a box around a specific disease, AnatomiX was 25% more accurate than the others.
  • The "Report" Test: It wrote better medical reports that were more accurate and easier for doctors to trust.

The Bottom Line

AnatomiX is like upgrading a robot from a parrot (which repeats what it hears) to a surgeon (which understands the body's structure). By forcing the AI to first identify where the body parts are before trying to diagnose them, it solves the biggest problem in medical AI: spatial confusion.

This means that in the future, AI assistants won't just be "smart" at reading text; they will be smart at understanding the human body, making them much safer and more reliable tools for doctors.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →