LLM-Bootstrapped Targeted Finding Guidance for Factual MLLM-based Medical Report Generation

The paper introduces Fact-Flow, a novel framework that enhances the factual accuracy of MLLM-based medical report generation by decoupling visual fact identification from text generation and utilizing an LLM-bootstrapped pipeline to create labeled training data without manual annotation.

Cunyuan Yang, Dejuan Song, Xiaotao Pang, Qianqian Shen, Wenjie Nie, Yifan Huang, Lei Wu, Wei Han, Haishuai Wang, Jiajun Bu

Published 2026-03-03
📖 3 min read☕ Coffee break read

Imagine you are trying to teach a brilliant but slightly scatterbrained artist (an AI) how to write a medical report based on an X-ray or an eye scan.

The Problem: The "Hallucinating Artist"
Currently, if you show this artist a picture of a broken bone and ask, "What do you see?", they might confidently write a beautiful story about a broken bone, but they might also accidentally invent a broken leg that isn't there, or forget to mention a tiny crack that is actually critical. In the medical world, making things up (hallucinating) or missing details is dangerous.

The old way of training these artists was to just show them a picture and the final report, hoping they would learn the connection. But they often get the facts wrong because they are trying to "see" the image and "write" the story at the exact same time, which is too much mental juggling.

The Solution: Fact-Flow (The "Two-Step Detective")
The authors of this paper, "Fact-Flow," propose a smarter way to train the AI. They break the job into two distinct steps, like a detective team working together:

  1. Step 1: The "Fact Finder" (The Labeler)
    Before writing a single word of the report, a specialized AI (the Fact Finder) looks at the image and simply checks off a list of things it sees.

    • Analogy: Imagine a security guard at a museum. They don't write a novel about the painting; they just tick a box: "Yes, there is a vase," "Yes, there is a crack," "No, there is no fire."
    • This step forces the AI to be honest about what is actually there before it tries to be creative.
  2. Step 2: The "Storyteller" (The Report Writer)
    The main AI (the Storyteller) then takes the image and the checklist from Step 1. It is told: "Here is the picture, and here is the list of confirmed facts. Now, write a professional medical report based on only these facts."

    • Analogy: This is like giving a writer a strict outline. They can't invent new characters or plot holes because they have to stick to the facts provided in the outline.

The Magic Trick: How do we get the checklist?
You might ask, "Who writes these checklists? Do doctors have to spend hours labeling every single X-ray?" That would be too expensive and slow.

The authors used a clever "bootstrapping" trick:

  • They took existing medical reports (which are just text) and asked a super-smart Large Language Model (LLM) to read them and extract the key facts automatically.
  • Analogy: It's like asking a librarian to read a thousand books and automatically create a master index of all the topics mentioned, without needing a human to read every page and write a tag. This created a massive training dataset for free.

The Results
When they tested this "Two-Step Detective" system on real medical data (like chest X-rays for tuberculosis and eye scans for retinal issues):

  • Fewer Lies: The AI stopped making up fake diseases.
  • Better Memory: It stopped forgetting important details.
  • Still Good Writing: The reports were still easy to read and sounded professional, just like before.

In a Nutshell
Think of Fact-Flow as putting a "fact-checker" in the room before the "writer" starts typing. By separating the job of finding the truth from the job of telling the story, the AI becomes much more reliable, making it safer to use in real hospitals.