SATTC: Structure-Aware Label-Free Test-Time Calibration for Cross-Subject EEG-to-Image Retrieval

Imagine you are trying to guess what picture a person is looking at, just by reading their brainwaves (EEG). This is the goal of EEG-to-Image Retrieval.

However, there's a huge problem: Everyone's brain is different.
If you train a computer to read your brainwaves, it might work great for you. But if you try to use that same computer on your friend, it fails miserably. Their brain signals are slightly shifted, like two radio stations broadcasting on slightly different frequencies.

Furthermore, the computer has a bad habit called "Hubness." Imagine a popular celebrity at a party. Even if you ask the computer, "Who is this person?" about a random stranger, the computer keeps pointing to the celebrity because they are so "loud" in the data. The computer ignores the quiet, rare, but actually correct answers.

This paper introduces SATTC, a clever "tuning knob" that fixes these problems without needing any new training data or labels. It works like a smart filter applied after the brain has already been scanned.

Here is how SATTC works, broken down into simple analogies:

1. The Problem: The "Noisy Room" and the "Loud Celebrity"

Subject Shift (The Noisy Room): Every person's brain is a different room. One room is echoey, another is muffled. If you try to understand a conversation in a new room without adjusting your ears, you get confused.
Hubness (The Loud Celebrity): In the computer's list of guesses, a few "popular" images (hubs) keep appearing at the top for almost everyone, drowning out the correct, specific images. It's like a search engine that always suggests "Google" for every query, even if you asked for "How to bake a cake."

2. The Solution: SATTC (The Smart Filter)

The authors built a system called SATTC that acts like a Structure-Aware Test-Time Calibration. Think of it as a "Post-Processing Chef" who takes the raw ingredients (the brain scan results) and seasons them perfectly right before serving, without needing to go back to the farm (re-training the model).

SATTC uses two "Experts" to fix the list of guesses:

Expert A: The "Local Density Detective" (Geometric Expert)

What it does: It looks at how crowded the neighborhood is around each guess.
The Analogy: Imagine you are looking for a friend in a crowd.
- If you are in a dense crowd (a popular image category), the detective says, "Okay, there are too many people here; let's be stricter about who we pick."
- If you are in a sparse crowd (a rare image category), the detective says, "There are very few people here; let's be more generous and include the few we see."
The Fix: It stops the "Loud Celebrity" from dominating by realizing that just because an image is popular doesn't mean it's the right answer for this specific person. It adjusts the volume based on how crowded the area is.

Expert B: The "Social Network Analyst" (Structural Expert)

What it does: It looks at the relationships between the guesses.
The Analogy: Imagine you are trying to find a lost item.
- Mutual Neighbors: If the computer thinks "Image A" is the best guess for your brain, AND your brain is the best guess for "Image A," that's a strong handshake. The analyst says, "Keep this one!"
- Popularity Check: If an image appears as the top guess for everyone in the room, the analyst gets suspicious. "Why is this image so popular? It's probably a fake lead." It pushes these "hub" images down the list.
The Fix: It boosts the confidence of matches that make sense in both directions and suppresses the "fake popular" ones.

3. The Magic Blend: "Product of Experts"

Finally, SATTC takes the advice from both experts and blends them together.

If the Detective says, "This area is sparse, trust the rare match," and the Analyst says, "This match is mutual and strong," SATTC boosts that guess to the top of the list.
If the Detective says, "This area is too crowded," and the Analyst says, "This image is a fake hub," SATTC pushes that guess down.

Why is this a big deal?

No Labels Needed: Usually, to fix a model for a new person, you need to show them 100 pictures and say, "This is a cat, this is a dog." SATTC works blindly. It looks at the brainwaves and the list of guesses and figures out the pattern on its own.
Works on Any Brain: It doesn't matter if the person uses a specific type of brain scanner or a different AI model. SATTC is a "plug-and-play" layer that works on top of almost any existing system.
Better Small Lists: In real life, you don't want a list of 100 guesses; you want the top 1 or 5. SATTC makes those small lists much more reliable, ensuring the correct image is actually in the top 5.

The Bottom Line

The authors took a system that was struggling because everyone's brains are different and the computer kept getting distracted by "popular" wrong answers. They built a smart, label-free filter that listens to the "crowd density" and the "social connections" between guesses to clean up the results.

The result? A system that can look at a stranger's brainwaves and guess what they are seeing with much higher accuracy, making the technology ready for real-world use without needing a massive amount of new training data.

1. Problem Statement

The paper addresses the challenge of cross-subject EEG-to-image retrieval, where the goal is to decode visual perception from non-invasive EEG signals to retrieve corresponding images from a large vocabulary without subject-specific labels.

The core difficulties in this setting are:

Subject Shift: EEG feature distributions vary significantly across different individuals (mean, variance, and covariance shifts), causing models trained on one set of subjects to fail on unseen subjects.
Hubness: In high-dimensional embedding spaces, certain "hub" classes appear in the top- $k$ lists of many unrelated queries. This distorts similarity geometry, destabilizes rankings, and makes small- $k$ shortlists (crucial for decoding) unreliable.
Lack of Test-Time Calibration: Existing methods focus on encoder training or require labeled data for adaptation. There is a lack of label-free, test-time calibration mechanisms that can standardize inference and correct retrieval geometry using only the similarity matrix of frozen encoders.

2. Methodology: SATTC Framework

The authors propose SATTC (Structure-Aware Test-Time Calibration), a label-free calibration head that operates directly on the similarity matrix generated by frozen EEG and image encoders. It does not require retraining or access to test labels.

The framework consists of three main stages:

A. Geometric Normalization (Subject-Adaptive Whitening - SAW)

To mitigate subject shift, the authors introduce Subject-Adaptive Whitening (SAW):

Process: For each test subject, they estimate the mean ( $\mu_s$ ) and covariance ( $\Sigma_s$ ) from unlabeled EEG embeddings.
Transformation: They apply a whitening transform $W_s = (\Sigma_s + \lambda I)^{-1/2}$ to center and normalize the embeddings, followed by $\ell_2$ -normalization.
Result: This maps different subjects onto a shared hypersphere, preserving relative directions while removing subject-specific statistical shifts. A similar global whitening is optionally applied to image candidates.

B. Geometric Expert: Adaptive CSLS

To address hubness, the authors propose an Adaptive Cross-domain Similarity Local Scaling (CSLS) variant:

Standard CSLS Limitation: Traditional CSLS uses a fixed neighborhood size $k$ , assuming uniform density. This fails in cross-subject EEG where some regions are sparse and others are dense hubs.
Adaptive Mechanism: SATTC estimates local densities for both queries (rows) and classes (columns) based on the similarity matrix.
- Denser queries/classes are assigned larger neighborhood sizes ( $k$ ).
- Sparser queries/classes are assigned smaller neighborhood sizes.
Scoring: The similarity score is rescaled by subtracting these adaptive local averages, effectively down-weighting globally popular hubs and up-weighting rare but relevant matches without tuning a global $k$ .

C. Structural Expert: Pre-CSLS Priors

The authors introduce a Structural Expert that leverages structural cues present in the pre-CSLS similarity matrix:

Mutual Nearest Neighbors (MNN): Identifies strict pairs where a query and class are each other's top-1 match.
Bidirectional Top- $k$ : Identifies consistent matches where both are within top- $L$ ranks of each other.
Class Popularity: Detects "hub-like" candidates that appear frequently in top- $k$ lists but lack local support for specific queries.
Scoring: This expert adds a positive bias to MNN/bidirectional pairs and a negative penalty to hub-like candidates based on their popularity.

D. Product-of-Experts (PoE) Fusion

The final calibrated score is obtained by fusing the Geometric Expert ( $S_{geom}$ ) and the Structural Expert ( $S_{struct}$ ) via a Product-of-Experts rule:
$S_{final}(q, c) = \alpha S_{geom}(q, c) + \beta S_{struct}(q, c)$
This fusion is lightweight, interpretable, and acts as a single-shot regularizer to produce stable, calibrated top- $k$ shortlists.

3. Key Contributions

Problem Formulation: The paper frames cross-subject EEG-to-image retrieval as a label-free test-time calibration problem, explicitly characterizing how subject shift and hubness jointly destabilize rankings.
Geometric Expert: Introduction of an Adaptive CSLS scheme that derives query- and class-dependent neighborhood sizes from local densities, eliminating the need for global $k$ tuning while effectively mitigating hubness.
Structural Expert: A novel module that utilizes Mutual Nearest Neighbors, bidirectional ranks, and class popularity as structural priors to refine rankings without modifying the underlying encoders.
Encoder-Agnostic Design: The method operates purely on the similarity matrix, making it a plug-and-play calibration layer compatible with any frozen EEG/image encoder.

4. Experimental Results

The method was evaluated on the THINGS-EEG dataset using a strict Leave-One-Subject-Out (LOSO) protocol.

Performance Gains:
- Baseline Improvement: A standardized baseline (Cosine similarity + $\ell_2$ normalization + Candidate Whitening) already outperformed the original ATM retrieval setup.
- SATTC Performance: Adding SATTC to the baseline significantly improved Top-1 accuracy (from 9.2% to 14.8%) and Top-5 accuracy (from 30.5% to 38.4%).
- Comparison: SATTC outperformed both fixed- $k$ CSLS and adaptive CSLS alone, particularly in improving Top-1 while maintaining Top-5.
Hubness Reduction: SATTC produced the most uniform class-popularity distribution, significantly reducing the dominance of hub classes and improving per-class fairness (Recall@5).
Generalization: The method was tested on four different EEG encoders (ATM, EEGNetV4, EEGConformer, ShallowFBCSPNet). SATTC consistently improved performance across all architectures (e.g., +8–16% Top-5 gain), proving its encoder-agnostic nature.
Stability: The improvements were robust across different random seeds and subject splits.

5. Significance

Practical Deployment: SATTC enables reliable visual decoding in realistic cross-subject scenarios where no labeled data is available for new users. It solves the "cold start" problem for neural decoding.
Efficiency: By operating on the similarity matrix of frozen encoders, it avoids the computational cost of retraining or complex domain adaptation networks.
Reliability: It specifically targets the reliability of small- $k$ shortlists, which are critical for downstream applications like brain-computer interfaces (BCI) where a user must select from a limited set of options.
Paradigm Shift: The work suggests that similarity-space calibration is a powerful, often overlooked, avenue for improving cross-subject neural decoding, complementing traditional representation learning approaches.

In summary, SATTC provides a robust, label-free, and plug-and-play solution to the critical challenges of subject shift and hubness in EEG-to-image retrieval, significantly enhancing the reliability of cross-subject visual decoding.