OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis

OmniCT introduces a unified slice-volume Large Vision-Language Model that overcomes the limitations of existing fragmented approaches by integrating spatial consistency and organ-level semantic enhancements to achieve comprehensive, high-precision CT analysis across both local and global clinical tasks.

Tianwei Lin, Zhongwei Qiu, Wenqiao Zhang, Jiang Liu, Yihan Xie, Mingjian Gao, Zhenxuan Fan, Zhaocheng Li, Sijing Li, Zhongle Xie, Peng LU, Yueting Zhuang, Ling Zhang, Beng Chin Ooi, Yingda Xia

Published 2026-03-03
📖 5 min read🧠 Deep dive

Imagine you are trying to understand a complex story, but you have two different ways of looking at the book:

  1. The "Single Page" View: You look at one page at a time. You can read the words clearly and see the small details (like a typo or a specific drawing), but you can't see how the story flows from one page to the next.
  2. The "Whole Book" View: You hold the entire book in your hands. You can see the big picture, how Chapter 1 connects to Chapter 10, and the overall shape of the story. But if you try to read a single word on a specific page, it's too blurry and small to make out.

The Problem:
In the world of medical AI, specifically for CT scans (which are like 3D X-rays of the human body), doctors need both views.

  • They need to see a tiny, sub-centimeter nodule on a single slice (the "Single Page" view).
  • They also need to see how a tumor spreads through an organ or how one organ pushes against another (the "Whole Book" view).

Currently, AI models are stuck in one camp or the other. Some are great at reading single slices but get lost in the 3D structure. Others are great at 3D shapes but miss the tiny, critical details. This makes them unreliable for real-world doctors.

The Solution: OmniCT
The paper introduces OmniCT, a new AI model that acts like a super-reader who can flip through pages and hold the whole book at the same time. It unifies these two perspectives into one powerful brain.

Here is how it works, using simple analogies:

1. The "Spatial Consistency" Trick (SCE)

  • The Analogy: Imagine you are trying to understand a 3D object, like a loaf of bread, but you only have a camera that takes flat photos.
  • How OmniCT does it: Instead of just taking one photo, it takes three slices of bread and stacks them together to make a tiny "mini-loaf." It then teaches the AI to recognize that these three slices belong together and have a specific order (top, middle, bottom).
  • The Magic: It also adds "GPS coordinates" to every pixel. Just like a map has North, South, East, and West, OmniCT gives every part of the image a 3D address (Up/Down, Left/Right, Front/Back). This helps the AI understand the shape of the body, not just the flat picture.

2. The "Organ-Level" Focus (OSE)

  • The Analogy: Imagine a detective looking at a crime scene. If they look at the entire room equally, they might miss a tiny clue on the floor. But if they know exactly where the "important stuff" is (like the safe or the weapon), they can zoom in on those spots.
  • How OmniCT does it: The AI is taught to identify specific organs (like the liver or heart) first. It then creates a "highlight reel" of just those organs.
  • The Magic: It uses a smart compression technique. If an organ is huge (like the liver), it summarizes it efficiently. If an organ is tiny (like the pancreas), it "magnifies" the details so the AI doesn't miss anything. This ensures the AI pays attention to the right places without getting overwhelmed by too much data.

3. The "MedEval-CT" (The New Exam)

  • The Analogy: Before, if you wanted to test a student's math skills, you might give them a mix of algebra and geometry questions. But for medical AI, the tests were messy. Some tests only had 2D pictures, others only had 3D volumes, and they weren't fair.
  • The Innovation: The authors built MedEval-CT, the world's largest and most fair "final exam" for medical AI.
    • It contains 1.7 million questions (like a massive question bank).
    • It tests the AI on everything: from simple "What is this?" questions to complex "What should the doctor do next?" reasoning.
    • It covers 13 different organs and various types of medical tasks.

Why This Matters

Think of previous AI models as specialized tools: a hammer is great for nails but bad for screws. A screwdriver is great for screws but bad for nails.

OmniCT is the "Swiss Army Knife" of medical imaging.

  • It is smarter than current models at spotting tiny tumors (micro-level).
  • It is better at understanding how organs relate to each other (macro-level).
  • It is more reliable because it was tested on a massive, fair dataset that mimics real hospital scenarios.

The Bottom Line:
OmniCT bridges the gap between looking at a single photo and understanding the whole 3D body. By doing this, it moves medical AI one giant step closer to actually helping doctors diagnose diseases accurately and safely, rather than just being a cool tech demo. It's not just an upgrade; it's a new way of thinking about how computers see the human body.