Single-Slice-to-3D Reconstruction in Medical Imaging and Natural Objects: A Comparative Benchmark with SAM 3D

This paper benchmarks five state-of-the-art image-to-3D foundation models on medical and natural datasets, revealing that while all struggle with severe depth ambiguity in single-slice reconstruction, SAM3D best preserves topological similarity to medical shapes, ultimately demonstrating that reliable medical 3D inference requires domain-specific adaptation beyond current zero-shot capabilities.

Yan Luo, Advaith Ravishankar, Serena Liu, Yutong Yang, Mengyu Wang

Published 2026-03-03
📖 4 min read☕ Coffee break read

Imagine you are trying to build a detailed, 3D model of a house, but you only have one single photograph of the front door.

That is essentially the challenge this paper tackles. In the medical world, doctors often have 2D slices (like a single slice of bread from a loaf) from CT or MRI scans. They want to turn that single flat image into a full 3D volume to see tumors, organs, or bones in their full depth.

Recently, powerful AI models (called "Foundation Models") have been trained on millions of photos of everyday objects—cats, cars, chairs—to guess what the 3D object looks like behind the camera. The big question the authors asked was: "Can these AI models, which are experts at guessing 3D shapes from natural photos, do the same job for medical scans?"

Here is the breakdown of their findings, using some everyday analogies:

1. The "Flatland" Problem

The researchers tested five of the smartest AI models available (including one called SAM3D) on medical data.

  • The Analogy: Imagine trying to guess the shape of a complex, crumpled piece of paper just by looking at its shadow on a wall.
  • The Reality: Natural photos have shadows, textures, and objects blocking each other, which give our brains (and AI) clues about depth. Medical slices are different. They are often just flat, uniform gray shapes with no shadows or depth cues.
  • The Result: Because the AI lacks these depth clues, it gets confused. Instead of building a deep, 3D "loaf of bread," the AI tends to build a very thin, flat "sheet of paper." It fails to guess how deep the object actually is.

2. The "Good News, Bad News" Report Card

The paper graded the models on two different things:

  • The "Volume" Grade (Bad): If you ask, "How much of the 3D space did the AI fill correctly?" the score was terrible for everyone. The models were so bad at guessing depth that they barely overlapped with the real 3D shape. It's like trying to fill a swimming pool with a single drop of water.
  • The "Shape" Grade (Better): If you ask, "Does the AI at least get the general outline right?" one model, SAM3D, did the best.
    • The Analogy: Even though the AI couldn't guess the depth of a tumor, SAM3D was better at guessing the width and height. It was like a sculptor who couldn't carve the depth of a statue but managed to get the silhouette (the shadow shape) mostly correct.

3. The "Smooth vs. Bumpy" Challenge

The researchers tested two types of medical targets:

  • Anatomy (The "Smooth" Stuff): Things like spines or airways. These are relatively smooth and predictable.
    • Result: The AI did okay here. It's like guessing the shape of a smooth, round apple.
  • Pathology (The "Bumpy" Stuff): Things like tumors or irregular lesions. These are jagged, weird, and non-convex (they have holes and bumps).
    • Result: The AI struggled immensely.
    • The Analogy: Trying to guess the shape of a tumor from one slice is like trying to guess the shape of a crumpled ball of tinfoil just by looking at one side of it. The AI's training on "smooth" natural objects (like cars or cups) didn't help it handle the messy, irregular shapes of diseases.

4. The "Natural vs. Medical" Gap

When the researchers tested these same AIs on photos of natural objects (like household items or animals), the models performed much better.

  • The Takeaway: These AIs are like chefs who are masters at cooking Italian food (natural images) but have never tried to cook Thai food (medical images). They can guess the shape of a cat or a chair perfectly, but when you hand them a medical scan, they are out of their element.

The Bottom Line

The paper concludes that while these fancy AI models are impressive, we cannot just use them "out of the box" for medical 3D reconstruction.

  • Why? Because a single 2D medical slice doesn't have enough information for the AI to guess the 3D depth.
  • What's needed? We need to teach these models specifically about human anatomy. We need to give them "rules" about how organs and tumors usually look, rather than letting them guess based on photos of cats and cars.

In short: The AI is a talented artist who can draw a 3D car from a photo, but if you hand it a single slice of a brain scan, it will likely draw a flat, 2D blob. To fix this, we need to train the artist specifically on medical textbooks, not just art galleries.