Incremental dimension reduction for efficient and accurate visual anomaly detection

The Big Problem: The "Too Much Stuff" Dilemma

Imagine you are a security guard trying to spot a thief in a massive museum. You have a photo of every single object in the museum (a "normal" object) to compare against new photos.

In the world of AI, this is called Visual Anomaly Detection. The AI looks at an image, breaks it into tiny pieces (like puzzle pieces), and creates a detailed "fingerprint" (a feature vector) for every piece.

The Catch:
If you have 1,000 images, and each image is broken into 1,000 pieces, and each piece has a fingerprint with 1,000 numbers... you end up with billions of numbers.

Storage: It's like trying to carry a library of encyclopedias in your backpack. Your computer runs out of memory (RAM) and crashes.
Speed: Comparing a new photo against billions of numbers is like trying to find a specific grain of sand on a beach by checking every single grain one by one. It takes forever.

The Old Solution: The "Random Sample"

To fix this, the popular method (called PatchCore) tries to be smart. Instead of remembering every fingerprint, it picks a "representative sample" (like picking the 10 most interesting grains of sand).

The Flaw: Even picking that sample is slow because the computer still has to look at all the billions of numbers first to decide which ones to keep. Plus, if you have a huge dataset, even the "sample" is too big to fit in memory.

The New Solution: The "Smart Summarizer"

The author, Teng-Yok Lee, proposes a new method called Incremental Dimension Reduction. Think of it as a Smart Summarizer that works in two clever ways:

1. The "Batch Processing" Analogy

Instead of trying to read the entire library of encyclopedias at once (which is impossible), the AI reads them in batches (e.g., 10 books at a time).

2. The "Compression" Analogy (The Magic Trick)

Here is the core innovation. When the AI reads a batch of 10 books, it doesn't just memorize them. It instantly writes a summary of that batch.

Old Way: "I read 10 books. I need to remember every sentence." (Too much space).
New Way: "I read 10 books. I realized they all talk about 'Space Exploration.' I will just remember the word 'Space' and the main themes." (Tiny space).

In math terms, this is called Truncated Singular Value Decomposition (SVD). It finds the "main themes" (singular vectors) of the data and throws away the boring, repetitive details.

How It Works Step-by-Step

Group & Compress: The AI takes a batch of images, finds their "main themes," and compresses them into a tiny, low-dimensional summary. It throws away the raw, heavy data immediately to save memory.
Update the Master List: It takes this new summary and updates its "Master List" of themes. It doesn't go back and re-read the previous batches; it just adds the new summary to the existing list.
The Final Re-Alignment: Once all batches are processed, the AI does one final quick calculation to make sure all the summaries from different batches speak the same "language" (mathematically aligning them).
The Result: You now have a tiny, compressed library of "themes" that fits easily in your computer's memory, but still captures the essence of the original billions of numbers.

Why Is This a Big Deal?

The paper tested this on two huge datasets:

MVTec AD: A standard industrial dataset.
- Result: The new method was just as accurate as the old method at finding defects (like scratches on a metal nut), but it didn't crash the computer.
Eyecandies: A massive dataset with 6,600 images (which usually requires a super-expensive, high-end graphics card to process).
- Result: With this new method, the AI could process this huge dataset on a standard computer in just 3 hours. Without this method, it would have been impossible or would have taken days.

The Bottom Line

Imagine you are trying to learn a language.

The Old Way: You try to memorize every single word in the dictionary at once. You get overwhelmed and give up.
The New Way: You learn the most common 500 words first. Then you learn the next 500, merging them into your vocabulary. By the end, you can understand 99% of conversations, but you only had to carry a small notebook in your pocket instead of a heavy dictionary.

This paper gives AI a "small notebook" so it can learn from massive amounts of data without needing a supercomputer.

1. Problem Statement

Visual anomaly detection (AD) algorithms, particularly state-of-the-art methods like PatchCore, rely on deep neural networks (e.g., WideResNet50) to extract high-dimensional feature vectors from image patches. While effective, this approach faces significant scalability challenges when applied to large datasets (thousands to tens of thousands of images):

Memory Overhead: Storing feature vectors for all patches of all training images consumes massive amounts of memory (e.g., 25GB for a dataset of 6,600 images with 1024-dim vectors).
Computational Bottleneck: The sampling process (selecting a representative subset of features for the "memory bank") requires computing distances between high-dimensional vectors. The time complexity is $O(m)$ per pair, where $m$ is the dimensionality. High dimensionality slows down both training (sampling) and testing (inference).
Limitations of Existing Reduction Methods:
- Standard SVD: Requires all data to be loaded into memory simultaneously, which is infeasible for large datasets.
- Incremental SVD/PCA: Existing online methods often re-transform all previously visited vectors every time a new batch is processed. This leads to increasing computational costs as more batches are added, negating efficiency gains.

2. Methodology

The authors propose a novel Incremental Dimension Reduction Algorithm that combines concepts from Incremental SVD and Incremental PCA to process data in batches without re-visiting previous data.

Core Algorithm Workflow

The algorithm treats the set of feature vectors as a matrix $X$ and aims to compute a truncated Singular Value Decomposition (SVD) such that $X \approx USV^T$ . It processes data in batches ( $X_1, X_2, \dots, X_b$ ):

Batch Processing & Local SVD:
- For each incoming batch $X_b$ , the algorithm computes its local truncated SVD: $X_b \approx U_b S_b V_b^T$ .
- It then updates the global singular values and vectors ( $S_{1,b}, U_{1,b}$ ) by approximating the Gram matrix of the concatenated batches $[X_1, \dots, X_b]$ .
- The update logic follows the equation:
  $[X_1, \dots, X_b][X_1, \dots, X_b]^T \approx [U_{1,b-1}S_{1,b-1}, X_b][U_{1,b-1}S_{1,b-1}, X_b]^T$
- By computing the SVD of the concatenated matrix $[U_{1,b-1}S_{1,b-1}, X_b]$ , the algorithm derives the new global basis $U_{1,b}$ and singular values $S_{1,b}$ .
- Crucial Optimization: Unlike traditional incremental methods, this approach discards the previous global basis ( $U_{1,b-1}$ ) and keeps only the local SVD matrices ( $U_b, S_b, V_b$ ) of the current batch in memory. This prevents the need to re-transform all historical data at every step.
Final Reconstruction & Rotation:
- After processing all batches, the algorithm performs a final transformation to project all batch-wise reduced vectors ( $V_b$ ) into the space spanned by the final global singular vectors ( $U_{1,B}$ ).
- To avoid numerical instability and excessive memory usage (reconstructing full $X$ ), it computes a rotation matrix $R_b$ for each batch:
  $R_b = (U_{1,B})^T U_b S_b$
- The final reduced vectors are obtained by $V_b^{final} \approx R_b V_b$ . This operation is efficient ( $O(k \times m \times k)$ ) and keeps memory usage low.
Integration with PatchCore:
- The resulting low-dimensional vectors are used as input for the standard PatchCore algorithm.
- The memory bank is constructed by sampling from these reduced vectors.
- During testing, new feature vectors are projected into the reduced space using the final basis $U_{1,B}$ and compared against the memory bank.

3. Key Contributions

Novel Incremental SVD Strategy: The paper introduces a method that updates the global basis incrementally without re-processing historical data, solving the "re-transformation" bottleneck of existing online SVD/PCA methods.
Memory Efficiency: By discarding intermediate global bases and only storing local batch SVDs, the algorithm allows processing of datasets that would otherwise exceed GPU/CPU memory limits (e.g., 25GB datasets reduced to manageable sizes).
Accuracy Preservation: The method achieves accuracy comparable to the original PatchCore (using full dimensions) while significantly reducing feature dimensionality.
Deterministic Sampling: The authors modified the PatchCore sampling scheme to use the average of all vectors as the initial anchor, removing randomness and ensuring reproducibility.

4. Experimental Results

The algorithm was evaluated on the MVTec AD and Eyecandies datasets using backbones like WideResNet50 and ResNet18.

Accuracy (MVTec AD):
- With $k=128$ dimensions (reduced from ~1024), the Image-level AUROC dropped only slightly (99.0% $\to$ 98.9%) compared to the full-dimension baseline.
- Pixel-level AUROC remained very close (97.9% vs 97.7%).
- Texture vs. Object: Dimension reduction impacted texture categories (e.g., carpet) slightly more than object categories, but performance remained high.
Efficiency & Scalability:
- Training Time: On CPUs, training time was proportional to the reduced dimension $k$ . The batch size ( $n_b$ ) had a negligible impact on speed.
- Large Dataset Feasibility: On the Eyecandies dataset (6,600 images), the full-dimension approach required ~25GB of RAM, which is unaffordable on standard hardware. The incremental algorithm reduced dimensionality to 128, enabling training in 3 hours on a GPU.
Comparison with PaDiM: On Eyecandies, the proposed PatchCore variant with dimension reduction outperformed the state-of-the-art PaDiM method (Avg Image AUROC: 80.4% vs 79.0%).

5. Significance

This work bridges the gap between high-accuracy deep learning anomaly detection and practical deployment on large-scale industrial datasets.

Scalability: It enables the application of PatchCore to datasets with thousands of images that were previously intractable due to memory constraints.
Efficiency: It drastically reduces the computational cost of the sampling phase without sacrificing detection accuracy.
Generalizability: The incremental dimension reduction technique is not limited to PatchCore and can be applied to other memory-bank-based anomaly detection frameworks or any scenario requiring SVD on massive, streaming datasets.

The paper concludes that while the current method is highly effective, future work could focus on eliminating redundant features before sampling to further accelerate the training bottleneck.