The Geometry of Transfer: Unlocking Medical Vision Manifolds for Training-Free Model Ranking

Imagine you are a chef trying to cook a perfect meal for a very specific guest. You have a massive library of 114,000 pre-cooked ingredients (these are the "Medical Foundation Models" trained by AI on huge amounts of unlabeled data). You need to pick the one ingredient that will work best for your specific dish (e.g., detecting a tiny tumor in a kidney vs. mapping a large brain region).

The Problem:
Usually, to find the best ingredient, you would have to cook a test batch with every single one of those 114,000 ingredients, taste them, and see which one wins. In the world of medical AI, "cooking" means fine-tuning the model, which takes days of computing power and costs a fortune. It's like trying to find the perfect spice by baking a whole cake for every single spice jar in the world.

The Old Way (The Flawed Map):
Previous methods tried to guess the winner by looking at the "statistics" of the ingredients. They asked, "Do the colors and shapes of the ingredients look similar to the final dish?"

The Flaw: This is like judging a map of a city by looking only at the average color of the paint. It misses the roads. In medical imaging, the most important part isn't the big, blurry background; it's the sharp, jagged edges where a tumor meets healthy tissue. Old methods got lost in the big picture and failed to see the critical boundaries.

The New Solution: "The Topology Detective"
This paper introduces a new way to pick the best model without cooking a single test batch. Instead of looking at statistics, they look at the shape and structure (the "Topology") of the data.

Think of it like this:

The Global View (GRTD): The "Tree of Life"
Imagine you have a bunch of people (data points) in a room. The old way just counted how many people are wearing red shirts. The new way builds a Minimum Spanning Tree—a single, unbroken line connecting everyone in the room based on how close they stand to each other.
- If the AI model is good, the line connecting "Tumor" people will naturally stay separate from the line connecting "Healthy" people.
- If the line gets tangled and mixes them up, the model is bad. This checks if the model understands the overall shape of the problem.
The Local View (LBTC): The "Fence Inspector"
Sometimes, the big picture looks fine, but the fence between two neighbors is broken. In medical scans, this is the boundary between a disease and healthy tissue.
- The new method zooms in on these critical edges. It checks: "If I stand right on the border, can the AI clearly tell me which side is which, or is it confused?"
- It ensures the AI doesn't just guess the general area but respects the sharp lines where lives depend on precision.
The Smart Mixer (Task-Adaptive Fusion): The "Tailored Suit"
Not all tasks are the same.
- If you are looking for a large organ (like a whole liver), you care more about the "Global View" (the big tree).
- If you are looking for a tiny, fragmented lesion (like a small stroke), you care more about the "Local View" (the fence).
- The paper's system is a smart tailor. It automatically decides how much weight to give the "Tree" vs. the "Fence" based on how complex the specific medical task is. It creates a custom score for every single job.

The Result:
The authors tested this on a massive benchmark called "OpenMind."

Old Methods: Got confused, often ranking the worst models as the best (negative correlation).
New Method: Correctly predicted the winner 31% better than the best existing methods.
Speed: It did this in minutes (by just looking at the data structure) instead of days (by retraining the models).

In a Nutshell:
Instead of trying every key in a giant keyring to open a door (fine-tuning), this new method looks at the shape of the key's teeth (topology) to instantly know which one fits. It checks both the overall shape of the key and the tiny notches on the edge, ensuring it picks the perfect tool for the job without ever having to try it in the lock. This saves massive amounts of time and money, making it possible to deploy the best medical AI models quickly and efficiently.

1. Problem Statement

The rapid proliferation of large-scale self-supervised learning (SSL) has created a vast "zoo" of medical foundation models. However, selecting the optimal pre-trained encoder for a specific downstream medical segmentation task remains a significant bottleneck.

The Challenge: Exhaustive fine-tuning of every candidate model on a target dataset is computationally prohibitive and time-consuming.
Limitations of Existing Solutions: Current Transferability Estimation (TE) metrics (e.g., LEEP, LogME, CCFV) were primarily designed for image classification. They rely on global statistical assumptions (e.g., linear separability, Gaussian distributions) which fail to capture the topological complexity required for dense prediction tasks like segmentation. Segmentation quality depends heavily on preserving local geometric structures near high-frequency anatomical boundaries, a nuance that purely statistical metrics often miss.

2. Methodology

The authors propose a Topology-Driven Transferability Estimation (TD-TE) framework. Instead of forcing feature spaces into statistical molds, this approach uses non-parametric, graph-theoretic structures to quantify manifold alignment. The framework consists of three core components:

A. Global Representation Topology Divergence (GRTD)

Goal: Quantify the structural alignment between the feature space and the semantic label space on a global scale.
Mechanism:
- Constructs two graphs: a Native Feature Graph ( $G_{feat}$ ) based on Euclidean distances in the embedding space, and a Semantic Label-Induced Graph ( $G_{sem}$ ) where edges between same-class samples are forced to zero (perfect clustering) and inter-class edges are penalized.
- Computes the Minimum Spanning Tree (MST) for both graphs.
- Calculates the discrepancy between the total weights of the feature MST and the semantic MST.
Significance: A lower divergence (closer to 0) indicates that the encoder's native geometry naturally respects semantic boundaries, implying high transferability.

B. Local Boundary-Aware Topological Consistency (LBTC)

Goal: Address the specific failure points of segmentation: critical anatomical boundaries where background heterogeneity exists.
Mechanism:
- Identifies "boundary anchors" using the morphological gradient of ground truth masks.
- Extracts local patches around these anchors and constructs local MST graphs.
- Measures the Topological Leakage Rate ( $\rho$ ), defined as the proportion of edges in the local MST that erroneously connect distinct semantic classes.
Significance: A high LBTC score (approaching 1) implies the encoder preserves strict topological separation even in ambiguous transition zones, which is crucial for precise segmentation.

C. Task-Adaptive Topological Fusion

Goal: Dynamically balance global and local metrics based on the complexity of the target task.
Mechanism:
- Defines a task complexity prior $\kappa = \log(|C|)$ , where $|C|$ is the number of semantic classes.
- Uses a sigmoid gating factor $\alpha$ to weight the normalized GRTD and LBTC scores.
- Logic: For complex multi-organ tasks (high $|C|$ ), the system prioritizes global structural isomorphism ( $\alpha \to 1$ ). For focal pathologies or small lesions, it emphasizes local boundary sharpness ( $\alpha \to 0$ ).

3. Key Contributions

Paradigm Shift: Moves TE evaluation from statistical overlap to manifold topology, better suited for dense prediction tasks.
Novel Metrics: Introduces GRTD for global structural alignment and LBTC for local boundary separability, specifically tailored for medical imaging.
Training-Free Proxy: Provides a robust method to rank models without any fine-tuning, significantly reducing computational costs.
Adaptive Fusion: Develops a mechanism to automatically adjust the importance of global vs. local features based on the semantic cardinality of the target task.

4. Experimental Results

The framework was validated on the OpenMind benchmark, covering 6 diverse anatomical segmentation tasks (In-Distribution and Out-of-Distribution) and 7 mainstream SSL foundation models pre-trained on 114,000 3D volumes.

Performance: The proposed method achieved a weighted Kendall's $\tau$ of 0.723, representing a ~31% relative improvement over state-of-the-art baselines (e.g., CCFV achieved 0.552).
Robustness:
- OOD Generalization: Outperformed baselines significantly on Out-of-Distribution tasks (e.g., cross-modality MR $\to$ CT transfer in KiTS19), where statistical methods often failed or showed negative correlation.
- Initialization Independence: The metric remained stable across different random decoder initializations (Kaiming, Xavier, Gaussian), proving it reflects intrinsic encoder quality rather than training noise.
Efficiency:
- Time Savings: The training-free scoring process took an average of ~7 minutes for 7 models, compared to >3000 minutes for exhaustive fine-tuning.
- Comparison: It was also faster than the CCFV metric (6.99 min vs. 609.9 min average) while offering superior accuracy.

5. Significance

This work addresses a critical bottleneck in the deployment of medical foundation models. By demonstrating that topological tractability is a superior predictor of segmentation performance than statistical overlap, the paper provides a scalable, resource-efficient solution for model selection. This enables clinicians and researchers to rapidly identify the best pre-trained models for specific anatomical targets without the prohibitive cost of fine-tuning, accelerating the translation of AI models into clinical practice.

The Geometry of Transfer: Unlocking Medical Vision Manifolds for Training-Free Model Ranking

1. Problem Statement

2. Methodology

A. Global Representation Topology Divergence (GRTD)

B. Local Boundary-Aware Topological Consistency (LBTC)

C. Task-Adaptive Topological Fusion

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Holos: A Web-Scale LLM-Based Multi-Agent System for the Agentic Web

Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation

Compositional Neuro-Symbolic Reasoning

Understanding the Nature of Generative AI as Threshold Logic in High-Dimensional Space

AIVV: Neuro-Symbolic LLM Agent-Integrated Verification and Validation for Trustworthy Autonomous Systems