Inference-Time Dynamic Modality Selection for Incomplete Multimodal Classification

This paper proposes DyMo, a novel inference-time dynamic modality selection framework that resolves the discard-imputation dilemma in incomplete multimodal classification by adaptively integrating reliable recovered modalities through a task-loss-guided selection algorithm, thereby significantly outperforming state-of-the-art methods across diverse datasets.

Siyi Du, Xinzhe Luo, Declan P. O'Regan, Chen Qin

Published 2026-02-24
📖 5 min read🧠 Deep dive

Imagine you are a detective trying to solve a mystery. You have a team of specialists: a Photographer, a Voice Analyst, and a Document Reader. Usually, you get all three reports, and solving the case is easy.

But in the real world, things go wrong. Sometimes the Photographer's camera breaks (missing image), or the Voice Analyst is on vacation (missing audio). You are left with incomplete information.

For a long time, AI researchers had two bad ways to handle this:

  1. The "Ignore" Strategy: "Oh, the Photographer is missing? No problem! I'll just guess based on what the Voice Analyst says."
    • The Problem: You might miss a crucial clue that only the photo could have provided. You are throwing away valuable evidence.
  2. The "Fake It" Strategy: "The Photographer is missing? No worries! I'll use a computer program to guess what the photo looks like and pretend it's real."
    • The Problem: The computer's guess might be blurry, wrong, or completely unrelated to the crime. If you use this fake photo, you might get confused and solve the wrong case. You are introducing "noise" or lies into your investigation.

This is called the "Discarding-Imputation Dilemma": Do you throw away the missing info (and lose clues), or do you try to fake it (and risk getting lied to)?

Enter DyMo: The Smart Detective

The paper introduces a new AI framework called DyMo (Dynamic Modality Selection). Instead of choosing between ignoring or faking, DyMo acts like a smart, adaptive detective who knows exactly when to trust a fake clue and when to ignore it.

Here is how DyMo works, using simple analogies:

1. The "Taste Test" (The Reward Function)

Imagine you have a bowl of soup (your current data). You are about to add a new ingredient (a "recovered" or faked photo).

  • Old AI: Just dumps the ingredient in. If it's rotten, the soup tastes bad.
  • DyMo: Before adding the ingredient, it does a "taste test." It asks: "If I add this fake photo, does my soup (my prediction) get better or worse?"
    • If the fake photo makes the soup taste better (the AI becomes more confident in its answer), DyMo adds it.
    • If the fake photo makes the soup taste worse (the AI gets confused), DyMo throws it away immediately.

2. The "Trust Score" (Intra-Class Similarity)

Sometimes, a fake photo looks okay, but it's actually misleading.

  • The Analogy: Imagine you are looking for a specific type of dog. You have a picture of a Golden Retriever. The computer generates a fake picture of a dog that looks like a Golden Retriever but is actually a wolf in a costume.
  • DyMo's Trick: It checks the "vibe" of the fake dog against a mental list of real Golden Retrievers it learned during training. If the fake dog feels "off" compared to the real ones, DyMo lowers its trust score and ignores it. This stops the AI from being tricked by "semantic misalignment" (things that look right but are wrong).

3. The "Iterative Selection" (Building the Puzzle Piece by Piece)

DyMo doesn't just grab all the fake clues at once. It builds the solution step-by-step.

  • Step 1: Look at what you have.
  • Step 2: Ask, "Which single missing piece, if I faked it, would help me the most?"
  • Step 3: Add that piece.
  • Step 4: Ask again, "Now that I have that, which next piece helps?"
  • Step 5: Stop adding pieces as soon as a new one starts to hurt your confidence.

This ensures the AI only uses the "high-quality" fake clues and discards the junk.

Why is this a big deal?

In the real world, data is messy. In medical diagnosis, a patient might be missing an MRI scan.

  • Old AI: Might ignore the MRI and guess based only on blood work (missing a tumor), OR it might generate a blurry, fake MRI that looks like a healthy brain, leading to a missed diagnosis.
  • DyMo: It tries to generate the MRI. If the generated MRI looks like a healthy brain but the blood work suggests a tumor, DyMo realizes, "Wait, this fake MRI contradicts the truth. I won't use it." It sticks to the reliable blood work. But if the fake MRI clearly shows a tumor that matches the blood work, it uses that too!

The Bottom Line

DyMo is like a smart filter for AI. It acknowledges that we can't always get perfect data. Instead of blindly trusting computer-generated guesses or blindly ignoring missing data, it dynamically tests every piece of recovered information. It only keeps the clues that make the AI smarter and throws away the ones that make it dumber.

The result? An AI that is much more reliable in the messy, imperfect real world, whether it's diagnosing diseases, recognizing faces, or analyzing marketing trends.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →