VesselFusion: Diffusion Models for Vessel Centerline Extraction from 3D CT Images

Imagine you are looking at a 3D CT scan of a human body. Inside, there is a complex, branching network of blood vessels, like a dense forest of tiny, winding rivers. Doctors need to map out the exact path of these rivers (the "centerlines") to plan surgeries or diagnose diseases.

The problem? Drawing these paths by hand is incredibly tedious, and drawing them automatically with old computer programs is messy. Old programs are like rigid robots: they follow strict rules and often get confused by the foggy, blurry edges of the vessels, resulting in broken lines, dead ends, or weird loops that don't exist in real life.

Enter VesselFusion. Think of this new method not as a rigid robot, but as a team of expert artists working together to recreate a map of a forest.

Here is how it works, broken down into simple steps:

1. The "Sketch First, Detail Later" Approach (Coarse-to-Fine)

Imagine trying to draw a complex tree. If you try to draw every single leaf and twig at full size immediately, you'll likely mess up the proportions. It's better to start with a rough sketch of the main branches, then zoom in to add the details.

VesselFusion does exactly this. Instead of trying to guess the exact millimeter-perfect coordinate of every point in the vessel all at once, it breaks the job into two parts:

The Grid (The Sketch): It first figures out which "neighborhood" or grid block the vessel is in.
The Offset (The Detail): Once it knows the neighborhood, it calculates the tiny, precise distance from the center of that block to the actual vessel.

This two-step process makes it much easier for the AI to learn the shape without getting overwhelmed by the sheer amount of data.

2. The "Dreaming" Process (Diffusion Models)

Traditional AI models are like a student taking a test: they look at the question (the CT scan) and give one single answer. If they are wrong, the answer is just wrong.

VesselFusion uses a Diffusion Model, which is more like an artist refining a sketch.

Imagine starting with a piece of paper covered in static noise (like TV snow).
The AI slowly "denoises" this picture, step-by-step, guided by the CT scan image.
With each step, the noise turns into a clearer picture of the vessel.
Because the AI has "learned" what healthy blood vessels look like from thousands of examples, it knows to avoid creating impossible shapes (like a vessel suddenly turning into a square or a loop that shouldn't be there). It captures the variability of nature, understanding that vessels can look slightly different in every person.

3. The "Council of Experts" (Voting-Based Aggregation)

Here is the catch: because the AI starts with random "noise" (like a random sketch), one single attempt might still produce a weird result—maybe a tiny tear in the line or a stray loop.

To fix this, VesselFusion doesn't just ask for one answer. It asks 100 different "versions" of itself to draw the map, each starting with a slightly different random noise.

The Analogy: Imagine asking 100 different cartographers to draw the same river.
The Voting: Some might draw a loop that doesn't exist; others might miss a small branch. But the real river will appear in almost all 100 drawings.
The Result: The system looks at all 100 maps and only keeps the parts where the experts agree (the "voting"). This filters out the weird mistakes and leaves a perfect, stable, and natural-looking vessel map.

Why is this a big deal?

Old Methods: Like a GPS that gets stuck in a loop or drops you in a field because the signal was fuzzy.
VesselFusion: Like a team of experienced hikers who know the terrain. Even if one hiker takes a wrong turn, the group consensus ensures you end up on the right path.

The Bottom Line:
VesselFusion is the first tool to use this "generative" and "voting" approach to map blood vessels. It produces maps that are not only more accurate (hitting the right coordinates) but also look more "human" and natural, avoiding the broken or impossible shapes that plague older technologies. This means doctors can trust the computer's map more, saving time and potentially saving lives.

Here is a detailed technical summary of the paper "VESSELFUSION: DIFFUSION MODELS FOR VESSEL CENTERLINE EXTRACTION FROM 3D CT IMAGES."

1. Problem Statement

Vessel centerline extraction from 3D CT images is critical for surgical planning, diagnosis, and hemodynamic analysis. While deep learning has improved vessel segmentation, creating dense 3D segmentation annotations is labor-intensive and prone to ambiguity due to unclear vessel boundaries.

Limitations of Existing Methods:
- Deterministic Models: Conventional approaches (e.g., iterative tracking or graph-based regression) are deterministic. They struggle to capture the complex variability of natural human anatomy, often producing broken segments, spurious connections, or unnatural structures (e.g., tears or loops).
- Annotation Burden: Segmentation-based methods require pixel-perfect 3D masks, which are difficult to annotate. Centerline extraction reduces this burden but existing centerline methods often lack robustness to noise.

2. Methodology: VesselFusion

The authors propose VesselFusion, the first conditional diffusion model designed specifically for extracting vessel centerlines from 3D CT images. The framework consists of three core components:

A. Coarse-to-Fine (C2F) Coordinate Representation

Directly regressing raw 3D coordinates in a diffusion model is unstable due to data sparsity and scale imbalances. VesselFusion introduces a structured representation:

Grid Discretization: The 3D space is divided into a coarse grid ( $G_x \times G_y \times G_z$ ).
Hybrid Encoding: Each point on the centerline is encoded as a combination of:
1. Discrete Component: Binary vectors representing the grid cell index.
2. Continuous Component: Local offsets ( $\Delta x, \Delta y, \Delta z$ ) representing the position relative to the grid cell center.
3. Validity Flag: A binary flag indicating if the entry is a valid point or padding (allowing variable-length sets to be processed as fixed-size matrices).
  This representation makes the data more tractable for the diffusion process.

B. CT-Conditioned Diffusion Model

Architecture: A conditional denoising diffusion probability model (DDPM) where the CT image acts as the conditioning input.
Process:
- Forward Process: Gradually adds Gaussian noise to the C2F representation of the ground truth centerline.
- Reverse Process (Training): A denoiser (based on a Transformer encoder conditioned on 3D CNN-extracted CT features) learns to predict the clean C2F representation from noisy inputs.
- Inference: Uses DDIM (Denoising Diffusion Implicit Models) sampling to deterministically reverse the process from pure noise to a predicted centerline.

C. Voting-Based Aggregation

Since diffusion models are stochastic, a single generation might produce anatomically implausible structures (e.g., loops or tears). To ensure stability:

Multiple Inferences: The model generates $K$ different centerline predictions from different initial noise seeds.
Voxel-wise Voting: These $K$ sets of coordinates are aggregated in a discrete coordinate space. A point is retained in the final output only if it appears in a sufficient number of predictions (determined by an optimal threshold $\tau$ ).
Result: This averages out stochastic errors, suppressing noise and unnatural artifacts while reinforcing consistent anatomical structures.

3. Key Contributions

First Generative Approach: Introduces the first method using a generative diffusion model for vessel centerline extraction from 3D CTs.
Novel Representation: Proposes a Coarse-to-Fine (C2F) coordinate encoding that effectively bridges discrete topology and continuous geometry for diffusion models.
Stabilization Strategy: Implements a voting-based aggregation mechanism to mitigate the stochastic nature of diffusion models, ensuring anatomically consistent and stable results.
Superior Performance: Demonstrates that generative modeling can outperform deterministic regression in capturing natural vessel variability.

4. Experimental Results

The method was evaluated on the ImageCAS dataset (1,000 coronary CT volumes) and compared against a U-Net baseline (segmentation-based) and VesselFormer (graph-based).

Quantitative Accuracy:
- VesselFusion achieved the highest F1-scores across all distance thresholds ( $R=1, 2, 3$ ).
- Notably, at $R=1$ (high spatial precision), VesselFusion scored 0.757, significantly outperforming VesselFormer (0.633) and the Baseline (0.733).
Structural Consistency (Topological Metrics):
- Betti-0 (Disconnected components): VesselFusion achieved 67.66, comparable to VesselFormer (68.29) and vastly superior to the Baseline (190.66), indicating fewer fragmented lines.
- Betti-1 (Loops): VesselFusion achieved 0.09, effectively eliminating spurious loops compared to VesselFormer (4.51).
Qualitative Analysis: Visual results showed that while baselines produced noisy, anatomically implausible artifacts, VesselFusion generated smooth, continuous, and line-like structures that closely matched ground truth.
Ablation Study:
- Both C2F representation and Voting contributed independently to accuracy.
- C2F primarily improved Recall (finding more vessel points).
- Voting primarily improved Precision (reducing false positives) and structural consistency (lowering Betti numbers).
- Combining both yielded the best overall performance.

5. Significance and Conclusion

VesselFusion represents a paradigm shift in vessel extraction by moving from deterministic regression to probabilistic generative modeling. By learning the distribution of natural vessel shapes, it avoids the "brittleness" of traditional methods that often fail at bifurcations or noisy regions.

Clinical Impact: The ability to extract accurate, natural centerlines with reduced annotation effort (relying on centerlines rather than dense masks) facilitates better surgical planning and computational hemodynamics.
Future Work: The authors note that the computational cost of generating multiple samples for voting remains a limitation, suggesting future optimization for real-time clinical application.

In summary, VesselFusion sets a new state-of-the-art by combining generative AI with topological constraints, achieving higher accuracy and anatomical plausibility than existing deterministic methods.

VesselFusion: Diffusion Models for Vessel Centerline Extraction from 3D CT Images

1. The "Sketch First, Detail Later" Approach (Coarse-to-Fine)

2. The "Dreaming" Process (Diffusion Models)

3. The "Council of Experts" (Voting-Based Aggregation)

Why is this a big deal?

1. Problem Statement

2. Methodology: VesselFusion

A. Coarse-to-Fine (C2F) Coordinate Representation

B. CT-Conditioned Diffusion Model

C. Voting-Based Aggregation

3. Key Contributions

4. Experimental Results

5. Significance and Conclusion

More like this

The Structure of Service Level Agreement of Slice-based 5G Network

Digital currency hardware wallets and the essence of money

Adaptive aggregation of Monte Carlo augmented decomposed filters for efficient group-equivariant convolutional neural network

Positionality in Σ_0^2 and a completeness result

Slightly Non-Linear Higher-Order Tree Transducers