Towards an Incremental Unified Multimodal Anomaly Detection: Augmenting Multimodal Denoising From an Information Bottleneck Perspective

This paper proposes IB-IUMAD, a novel incremental unified multimodal anomaly detection framework that mitigates catastrophic forgetting by leveraging a Mamba decoder to disentangle inter-object feature coupling and an information bottleneck module to filter redundant features, thereby preserving discriminative information across evolving categories.

Kaifang Long, Lianbo Ma, Jiaqi Liu, Liming Liu, Guoyang Xie

Published 2026-03-04
📖 5 min read🧠 Deep dive

Imagine you are the head of a quality control team in a massive factory. Your job is to spot defects on products coming down the assembly line.

The Old Way: The "One Expert Per Product" Problem

In the past, factories used a strategy called "One Expert Per Product."

  • If you made bagels, you hired a specialist who only knew bagels.
  • If you made cookies, you hired a different specialist who only knew cookies.
  • If you made carrots, you hired a third specialist.

The Problem: This is incredibly expensive and slow. If the factory suddenly starts making donuts, you have to hire a whole new team, train them from scratch, and buy new equipment. It's a logistical nightmare.

The New Goal: The "Super Detective"

The researchers wanted to build a Single Super Detective who could spot defects on any product (bagels, cookies, carrots, donuts) using just one brain. This is called Unified Multimodal Anomaly Detection.

To make this detective even smarter, they gave them two pairs of glasses:

  1. RGB Glasses: See the color and texture (like a normal camera).
  2. Depth Glasses: See the 3D shape and height (like a 3D scanner).

By combining these two views, the detective can spot a scratch that is invisible to color but obvious in 3D, or a dent that is obvious in color but invisible in 3D.

The Big Crisis: "Catastrophic Forgetting"

Here is where the story gets tricky. Imagine you train your Super Detective on Bagels first. They become a Bagel expert. Then, you teach them about Cookies.

The Disaster: As soon as they start learning about Cookies, their brain starts to glitch. They begin to forget what a Bagel looks like! They might start thinking a bagel is a cookie, or they miss defects on the bagels they used to know perfectly.

In computer science, this is called Catastrophic Forgetting. It's like trying to learn a new language, but every time you learn a new word, you forget half the words you already knew.

The Villains: "Spurious" and "Redundant" Features

The paper identifies two specific villains causing this memory loss:

  1. Spurious Features (The "Distractors"): These are fake clues. Imagine the detective sees a "crumb" on a cookie and thinks, "Ah, crumbs mean it's a cookie!" But then they see a crumb on a bagel and get confused. The crumb isn't a real clue for the category; it's just a random coincidence. The detective gets distracted by these fake links between different objects.
  2. Redundant Features (The "Noise"): This is too much information. Imagine the detective is looking at a cookie and sees the color, the texture, the shape, the shadow, the background, and the lighting. Most of this is just "noise." The brain gets overwhelmed by all the extra data and can't focus on the real defect.

When you combine two types of glasses (RGB + Depth), the noise and distractions get even louder, making the detective forget even faster.

The Solution: The "IB-IUMAD" Framework

The authors built a new system called IB-IUMAD (Incremental Unified Multimodal Anomaly Detection) to fix this. They used two clever tools:

1. The Mamba Decoder: The "Organizer"

Think of the Mamba Decoder as a super-organized librarian.

  • When the detective looks at a bagel and a cookie, their brains might get mixed up because they look similar in some ways.
  • The Librarian steps in and says, "Stop! Look at the label on the box. This is a bagel. That is a cookie. Don't mix their features up."
  • It untangles the messy connections between different objects, ensuring the detective learns the true features of a cookie without accidentally "stealing" features from the bagel.

2. The Information Bottleneck: The "Filter"

Think of this as a high-tech coffee filter.

  • The detective receives a huge cup of "feature soup" (all the data from the RGB and Depth glasses).
  • The Filter squeezes out all the redundant water (the noise, the background, the useless shadows).
  • It only lets the pure coffee (the essential, useful information) pass through to the detective's brain.
  • By forcing the brain to focus only on the most important clues, it stops the brain from getting overwhelmed and forgetting old lessons.

The Result

The researchers tested this new system on real factory data (MVTec 3D-AD and Eyecandies).

  • Before: The old systems forgot 12.5% of what they learned when moving to new products. They were slow and needed a lot of memory.
  • After (IB-IUMAD): The new system forgot only 6.3% (cutting the forgetting in half!). It was also 44 times more memory-efficient and 41 times faster than the old "One Expert Per Product" method.

The Takeaway

This paper is like a guide on how to teach a robot to be a master of many trades without losing its mind. By acting as a strict organizer (Mamba) and a ruthless filter (Information Bottleneck), the system learns new skills (like detecting donuts) without erasing its old skills (like detecting bagels), all while using less energy and memory than ever before.