Real-Time Glottis Detection Framework via Spatial-decoupled Feature Learning for Nasal Transnasal Intubation

This paper proposes Mobile GlottisNet, a lightweight and efficient deep learning framework utilizing spatial-decoupled feature learning and adaptive mechanisms to achieve real-time, high-speed glottis detection for nasotracheal intubation on resource-constrained edge devices.

Jinyu Liu, Gaoyang Zhang, Yang Zhou, Ruoyi Hao, Yang Zhang, Hongliang Ren

Published 2026-03-10
📖 4 min read☕ Coffee break read

Imagine you are trying to thread a needle, but the needle is a breathing tube, the eye of the needle is a tiny opening in your throat (the glottis), and you are doing this in the dark, while the person is moving, and your hands are shaking. This is the reality of nasotracheal intubation (NTI)—a life-saving emergency procedure where doctors must quickly find a patient's airway to help them breathe.

Currently, doctors rely heavily on their own eyes and experience. If they miss the target, it can be dangerous. Computers have tried to help by using cameras and "smart eyes" (AI) to point out the airway, but there's a big problem: existing smart eyes are too heavy.

Think of current AI systems like a giant, high-end supercomputer trying to run on a smartwatch. They are too slow, too big, and need too much power to be useful in an ambulance or a remote clinic. They take too long to "think," and in an emergency, every second counts.

The Solution: "Mobile GlottisNet"

The authors of this paper built a new AI system called Mobile GlottisNet. You can think of this as a lightweight, super-fast "smart glasses" app designed specifically to fit on small, portable medical devices.

Here is how they made it work, using some simple analogies:

1. The "Tiny Brain" (Lightweight Backbone)

Most AI models are like massive libraries with millions of books; they take forever to find the right page. Mobile GlottisNet is like a pocket-sized cheat sheet. It uses a highly efficient design (based on MobileNetV3) that strips away all the unnecessary fluff. It's so small (only 5MB—about the size of a few high-res photos) that it can run instantly on a small device without needing a massive server farm.

2. The "Smart Filter" (Hierarchical Dynamic Thresholding)

When the AI looks at a throat, it sees thousands of potential spots that might be the airway. Most are wrong.

  • Old way: The AI tries to guess on everything, getting confused by noise (like blood, saliva, or shadows).
  • New way: The authors added a "Smart Filter." Imagine a bouncer at a club who only lets in the VIPs. This filter dynamically decides, "Okay, this spot looks promising, let's focus on it," while ignoring the junk. It constantly adjusts its standards based on what it sees, ensuring it only pays attention to the best candidates.

3. The "Stretchy Lens" (Adaptive Feature Decoupling)

Throats aren't static; they move, twist, and get covered in fluids. A normal camera lens is rigid and might miss the target if it moves slightly.

  • The Innovation: The team gave the AI a "stretchy lens" (using deformable convolutions). If the airway shifts to the left or gets blurry, the AI's "eyes" physically stretch and shift to follow the shape of the airway. It decouples the "shape" of the airway from the "mess" around it, allowing it to see clearly even when the view is foggy or blocked.

4. The "Team Huddle" (Cross-Layer Weighting)

Deep learning models have different "layers" that see things at different scales (some see the big picture, some see tiny details).

  • The Innovation: Usually, these layers just shout their opinions at each other. Here, the authors added a "Team Huddle" mechanism. It weighs the opinions of the "Big Picture" layer and the "Tiny Detail" layer, deciding exactly how much to listen to each one depending on the situation. This ensures the AI doesn't miss the tiny opening just because it's looking at the whole throat.

The Results: Fast, Small, and Accurate

The team tested this system in three ways:

  1. Lab Simulations: Using a fake throat (phantom).
  2. Real Patients: Using data from hospitals.
  3. Public Databases: Testing on thousands of other images.

The verdict?

  • Speed: It runs at over 62 frames per second on standard devices and 33 frames per second on tiny edge devices. That means it updates the image more than 30 times a second—fast enough to track movement in real-time without lag.
  • Size: It fits in a tiny 5MB package.
  • Accuracy: It finds the airway as well as (or better than) the giant, slow supercomputers, even when the view is messy.

Why This Matters

Imagine a paramedic in a remote area or a doctor in a crowded emergency room. They don't have a supercomputer on a cart; they have a small, portable device. Mobile GlottisNet is the first system that can live on that small device, acting like a reliable co-pilot that says, "Look here, that's the airway," instantly and accurately.

It bridges the gap between "cool AI research" and "life-saving tool," ensuring that even in the most resource-limited situations, the patient gets the fastest, safest help possible.