Cross-Scanner Reliability of Brain MRI Foundation Model Embeddings: A Travelling-Heads Study

This study demonstrates that the cross-scanner reliability of brain MRI foundation model embeddings varies significantly based on pretraining strategy, with models incorporating biological metadata achieving scanner-robust performance comparable to traditional morphometric baselines, while purely self-supervised models exhibit substantial scanner-induced variance.

Navarro-Gonzalez, R., Aja-Fernandez, S., Planchuelo-Gomez, A., de Luis-Garcia, R.

Published 2026-03-25
📖 5 min read🧠 Deep dive
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Idea: The "Traveling Head" Experiment

Imagine you have a very smart AI assistant that looks at brain scans to understand how healthy a person is, how old they are, or if they have a disease. This AI is like a super-sensor that turns a complex brain image into a single "fingerprint" (a list of numbers) representing that person's brain.

The researchers wanted to know a crucial question: Does this AI fingerprint stay the same if you take the picture with a different camera?

In the real world, hospitals use different MRI machines (made by Siemens, Philips, or GE). These machines are like different camera brands. A photo of a cat taken with a Canon looks slightly different than one taken with a Sony. Usually, we don't care because the cat is still a cat. But in medical AI, if the "fingerprint" changes just because the camera changed, the AI might think the patient's brain has changed, when really, it's just a different machine.

To test this, the researchers used a "Traveling Heads" dataset. They took 20 healthy volunteers and scanned their brains on eight different MRI machines across the UK. This is like taking the same 20 people to eight different photo studios and taking their portraits.

The Contenders: Five AI Models vs. The Old Standard

The team tested five different "Foundation Models" (the new, fancy AI brains) and compared them to FreeSurfer (the old, trusted standard, like a classic film camera).

They asked: When we move from Machine A to Machine B, does the AI's "fingerprint" of the person stay consistent, or does it get confused?

The Results: The Good, The Bad, and The Ugly

The results were dramatic. The models fell into two distinct camps:

🏆 The Champions: "The Biology Guides"

Two models (AnatCL and y-Aware) were the stars of the show.

  • The Analogy: Imagine these models were taught by a strict biology teacher. Before they learned to look at the brain, they were told, "Ignore the lighting and the camera brand. Focus only on the shape of the nose and the color of the eyes."
  • The Result: When the volunteers moved to a new machine, these models gave almost the exact same fingerprint. They were so reliable that they actually performed better than the old standard (FreeSurfer). They successfully ignored the "camera noise" and focused on the "biological truth."

📉 The Strugglers: "The Self-Taught Students"

Three models (BrainIAC, BrainSegFounder, and 3D-Neuro-SimCLR) failed the test miserably.

  • The Analogy: These models were like students who were told to "Just look at the picture and guess." They weren't taught what to ignore. So, they learned to memorize the camera brand.
  • The Result: When the same person was scanned on a different machine, these models gave a completely different fingerprint. In fact, the models were so obsessed with the machine that if you showed them a fingerprint, they could guess which machine took the picture with 90% accuracy! They had confused the scanner with the person.

Why Did This Happen? (The Secret Sauce)

The researchers discovered that the architecture (the shape of the AI) or the amount of data it was trained on didn't matter. The only thing that mattered was how it was taught.

  • The Biology-Guided Models: These were trained using "contrastive learning" where they were explicitly shown biological facts (like age or brain volume) alongside the images. This forced the AI to learn that these specific features matter, and everything else (like the scanner) is just noise.
  • The Self-Supervised Models: These were trained by simply looking at millions of images and trying to guess which ones were similar. Without a teacher pointing out "biology is key," they accidentally learned that "Siemens machines look like this" and "GE machines look like that." They learned the hardware instead of the human.

The Takeaway: Why Should You Care?

This paper is a massive warning sign for the future of medical AI.

  1. Don't trust the "Black Box" blindly: Just because an AI is "pre-trained" on millions of images doesn't mean it's ready for the real world. If you use a model that hasn't been taught to ignore scanner differences, your medical diagnosis might depend on which hospital you visit, not your actual health.
  2. The "Traveling Head" test is essential: Before we trust these AI models in hospitals, we need to run this specific test: Scan the same person on different machines. If the AI's answer changes, the model is broken for multi-hospital use.
  3. Teach the AI what matters: The solution isn't just "more data." It's teaching the AI to care about biology and ignore the equipment.

In short: The best AI models are the ones that know how to look past the camera and see the person. The others are just taking pictures of the camera itself.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →