Representation Before Retrieval: Structured Patient Artifacts Reduce Hallucination in Clinical AI Systems

Contrary to the prevailing assumption that retrieval-augmented generation (RAG) mitigates hallucinations, this study demonstrates that RAG significantly increases unsupported claims in clinical AI, whereas converting heterogeneous patient data into structured, provenance-tracked artifacts offers a more effective approach to ensuring factual accuracy and safety.

Scanlin, J., Cuesta, A., Varsavsky, M.

Published 2026-02-16
📖 3 min read☕ Coffee break read
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are a doctor trying to diagnose a patient, but instead of looking at a neat, organized medical chart, you are handed a massive, chaotic pile of papers. This pile includes handwritten notes from 10 years ago, blurry photos of X-rays, data from a smartwatch, and genetic test results, all jumbled together with no clear order.

This is the problem this paper tackles regarding Artificial Intelligence (AI) in healthcare.

The Problem: The "Over-Confident" AI

Currently, we hope AI can act like a super-smart medical assistant. However, these AI models have a nasty habit called "hallucination." This is when the AI makes up facts that sound very convincing but are completely false.

The common belief was that if we gave the AI a "search engine" (called RAG) to look up the patient's real records before answering, it would stop making things up. It's like telling a student, "Don't guess; go look in the textbook first."

The paper's shocking discovery: In the messy world of real medical data, just giving the AI the "textbook" (raw search results) actually made it hallucinate much more often. It's as if the student was handed a library full of books but no table of contents, so they started guessing wildly to connect the dots, creating more nonsense than if they had just relied on their own training.

The Solution: Organizing the Chaos

The researchers tried a different approach. Instead of dumping a pile of raw text on the AI, they first acted like a super-organized librarian. They took all the messy data (notes, images, genetics) and turned it into structured, machine-readable "artifacts."

Think of it this way:

  • Raw Text (The Old Way): A messy kitchen counter covered in flour, eggs, and broken shells. You ask the AI to make a cake, and it tries to bake the shells.
  • Structured Artifacts (The New Way): The same ingredients, but they have been pre-measured, cracked, and placed in labeled bowls. The AI just has to mix them.

What They Found

The team tested four different ways of asking the AI for help:

  1. The Baseline: The AI guesses on its own. (Result: It made up facts about 5% of the time).
  2. The "Search Engine" (RAG): The AI searches the raw, messy notes. (Result: Disaster! It started making up facts 43% of the time. Giving it more unorganized information confused it).
  3. The "Structured" Approach: The AI uses the pre-organized, labeled data bowls. (Result: Much better! It only made up facts 8% of the time).
  4. The "Agent Workflow": The AI uses the organized data and has a "second pair of eyes" (a verification step) to double-check its work before speaking. (Result: The Winner. This was the safest and most useful method).

The Big Lesson

The paper concludes that how you present information matters more than just having more information.

If you hand a doctor (or an AI) a chaotic pile of papers, they will get overwhelmed and make mistakes. But if you organize that data into a clear, structured format with a clear trail of where every fact came from, the AI becomes much safer and more reliable.

In short: Don't just give the AI a library; give it a well-organized index card system. The quality of the representation determines how smart the AI can actually be.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →