BriMA: Bridged Modality Adaptation for Multi-Modal Continual Action Quality Assessment

Imagine you are a judge at a gymnastics competition. Your job is to watch a routine and give it a score based on how well the athlete performs. To do this perfectly, you usually rely on three things:

Video: Seeing the athlete move.
Audio: Hearing the music and the rhythm of their movements.
Text/Notes: Reading the official rules or commentary.

In a perfect world, you always have all three. But in the real world, things go wrong. Sometimes the camera glitches (no video), the microphone fails (no audio), or the notes get lost (no text). This is called Modality Imbalance.

Even worse, these problems aren't static. Today the camera works but the audio is bad; tomorrow the audio is fine but the camera is broken. This is Non-Stationary Imbalance.

The Problem: Why Old Judges Fail

Existing computer programs (AI) trained to judge these routines are like students who only studied for a test using a perfect textbook.

The "Forgetting" Problem: When you teach an AI a new routine (Task 2), it often forgets how to judge the old routine (Task 1). This is called "Catastrophic Forgetting."
The "Missing Data" Problem: If you feed the AI a video with no audio, it panics. It tries to guess the missing audio, but it usually guesses wrong, leading to a terrible score.
The "Drift" Problem: As the AI learns new things, its internal "rules" for scoring shift. A routine that used to get a 15 might suddenly get a 12 just because the AI's perspective changed, even if the performance was the same.

The Solution: BriMA (The Smart, Adaptable Judge)

The paper introduces BriMA (Bridged Modality Adaptation). Think of BriMA not as a rigid robot, but as a seasoned, adaptable coach who has a special toolkit to handle chaos.

BriMA uses two main tricks to stay calm and accurate:

1. The "Memory Bridge" (Filling in the Gaps)

Imagine you are trying to guess what a song sounds like, but the audio track is missing.

Old AI: Tries to invent a completely new song from scratch. It often sounds nothing like the original.
BriMA: It looks at its Memory Bank. It says, "I remember a similar routine from last week where the audio was missing. I know what the music should sound like based on the video."
The Magic: Instead of inventing the whole song, BriMA only calculates the tiny difference (the "residual") needed to fix the missing piece. It's like a carpenter who doesn't rebuild the whole table when a leg is broken; they just carve a perfect new leg to fit the existing table. This keeps the score accurate and consistent.

2. The "Smart Replay" (Learning Without Forgetting)

When a student studies for a new exam, they shouldn't just throw away their old notes.

Old AI: Replays old data randomly, like flipping through a textbook at random pages. It might waste time on easy examples and miss the hard ones.
BriMA: It acts like a strict tutor. It looks at its memory bank and asks:
- "Which old examples did I get wrong because the data was messy?"
- "Which examples am I likely to forget?"
- "Which examples are most important to keep the scoring rules stable?"
It then prioritizes these specific examples for review. It's like a teacher saying, "We aren't reviewing the whole chapter; we are only reviewing the three problems you keep getting wrong." This ensures the AI doesn't forget the past while learning the future.

Why This Matters

The researchers tested BriMA on three different datasets (Rhythmic Gymnastics, Figure Skating, and a large skating dataset). They simulated a world where sensors fail randomly (10%, 25%, or even 50% of the time).

The Results:

Higher Accuracy: BriMA scored much closer to human judges than any other method, even when half the data was missing.
Less Forgetting: It remembered how to judge old routines while learning new ones.
Real-World Ready: This isn't just a lab experiment. It means we can build AI systems for sports, rehabilitation, and skill training that won't crash just because a camera flickers or a sensor disconnects.

The Bottom Line

BriMA is the difference between a robot that breaks when the lights go out, and a human expert who can still judge the performance by feeling the rhythm and remembering past experiences. It bridges the gap between "perfect data" and "messy reality," making AI scoring reliable for the real world.

1. Problem Definition

The paper addresses a critical gap in Action Quality Assessment (AQA), which aims to score the performance quality of human actions (e.g., in sports or rehabilitation). While multi-modal AQA (using video, audio, text, etc.) has shown promise, real-world deployment faces two simultaneous challenges that existing methods fail to handle:

Non-Stationary Modality Imbalance: In practical scenarios, sensor failures, frame drops, or annotation gaps cause modalities to become missing or intermittently available. Crucially, this missingness is non-stationary, meaning the pattern of missing data changes over time as new tasks are learned.
Continual Learning Constraints: Existing continual learning (CL) methods for AQA assume complete and stable modalities. When modalities are missing, standard CL methods suffer from catastrophic forgetting and representation drift.
The Core Conflict: Standard imputation techniques (zero-filling, retrieval, or generative synthesis) often distort the "scoring manifold." In AQA, subtle temporal cues determine the score; therefore, naive reconstruction introduces bias that breaks the ranking consistency essential for accurate scoring.

2. Methodology: BriMA

The authors propose BriMA (Bridged Modality Adaptation), a framework designed to stabilize multi-modal continual AQA under evolving modality imbalance. The core idea is to construct a stable bridging space that aligns missing modalities with shared task structures and memory, rather than attempting full feature synthesis.

BriMA consists of two primary modules:

A. Memory-Guided Bridging Imputation (MBI)

Instead of generating missing features from scratch, MBI reconstructs them by predicting a minimal residual correction based on structurally aligned exemplars from past tasks.

Candidate Selection: For a missing modality $m$ , the system retrieves $K$ exemplar features from a memory buffer ( $\mathcal{B}_{t-1}$ ) using cosine similarity with the current observed representation.
Task Indicator: A binary mask identifies missing modalities, which selects a task-specific embedding to condition the reconstruction, ensuring domain consistency.
Imputation Bridge: The system computes an initial estimate by weighting the retrieved exemplars. A bridging network then predicts a residual correction ( $\Delta z$ $Δ z$ ) conditioned on the observed features and the task indicator.
- Formula: $\tilde{z} = \bar{z} + \Delta z$ , where $\bar{z}$ is the weighted exemplar prior.
- Benefit: This avoids the noise of generative synthesis and ensures the reconstructed features remain faithful to the scoring semantics.

B. Modality-Aware Replay Optimization (MRO)

To combat distribution shifts and forgetting, MRO dynamically prioritizes which past samples to replay during training.

Sample Selection: The memory buffer is curated to contain samples with complete modalities and balanced score coverage (quantile-based selection).
Prioritization: Samples are ranked by a priority score $q_i$ $q_{i}$ that combines:
- Modality Distortion ( $d_i$ ): The error in reconstructing the missing modality.
- Score Drift ( $\Delta y_i$ ): The change in the predicted score for a sample before and after learning a new task.
- Formula: $q_i = \alpha d_i + (1-\alpha) \Delta y_i$ .
Consistency Loss: The model is regularized to maintain prediction consistency on these high-priority replay samples, preventing the model from forgetting how to score actions when modalities shift.

3. Key Contributions

Problem Identification: The paper formally defines Non-Stationary Modality Imbalance in the context of continual AQA, demonstrating that existing methods fail significantly when modality availability evolves over time.
Novel Framework (BriMA): Introduces a tailored solution combining Memory-Guided Bridging (for score-faithful imputation) and Modality-Aware Replay (for drift-resistant adaptation).
Comprehensive Evaluation: Extensive experiments on three diverse datasets (RG, Fis-V, FS1000) covering rhythmic gymnastics and figure skating.
State-of-the-Art Performance: BriMA establishes a new baseline, outperforming existing continual learning and multi-modal methods across various missing rate scenarios (10%, 25%, 50%).

4. Experimental Results

The authors evaluated BriMA on three datasets: RG (Rhythmic Gymnastics), Fis-V (Figure Skating Video), and FS1000 (Large-scale Figure Skating).

Performance Gains:
- Correlation: Achieved 6–8% higher Spearman's Rank Correlation Coefficient (SRCC) compared to the best baselines.
- Error Reduction: Reduced Mean Squared Error (MSE) by 12–15% and Relative L2 error (RL2) by 13–15% on average.
- Robustness: As the missing modality rate increased to 50%, BriMA maintained stable performance, while baselines (like ST-MLAVL, EWC, DER++) suffered catastrophic drops in accuracy.
Ablation Studies:
- Removing the Bridging Imputation (MBI) caused a ~10% drop in SRCC, proving the necessity of residual correction over simple retrieval.
- Removing Modality-Aware Replay (MRO) led to higher error rates and instability, confirming the importance of prioritizing drift-sensitive samples.
Efficiency: Despite the added complexity, BriMA only increased parameters by ~0.1M and training time by ~1 hour, demonstrating a favorable efficiency-performance trade-off.
Generalization: The method was also tested on the MOSI sentiment analysis dataset, showing similar improvements, suggesting the approach generalizes to other multi-modal regression tasks.

5. Significance

This work is significant because it moves AQA research from idealized, static laboratory settings to real-world deployment constraints.

Practicality: It solves the "sensor failure" problem inherent in real-world sports analysis and rehabilitation, where data is rarely perfect or complete.
Theoretical Insight: It highlights that standard imputation techniques are insufficient for score-sensitive regression tasks. The proposed "bridging" strategy preserves the geometric structure of the scoring manifold, which is critical for ranking consistency.
Future Direction: BriMA provides a robust paradigm for building reliable multi-modal systems that can adapt to evolving data distributions without forgetting prior knowledge, a crucial step toward deploying AI in dynamic, real-world environments.

Code Availability: The authors have made their code publicly available at https://github.com/ZhouKanglei/BriMA.