Improving Anomaly Detection with Foundation-Model Synthesis and Wavelet-Domain Attention

This paper proposes a framework combining a foundation-model-based anomaly synthesis pipeline (FMAS) and a Wavelet Domain Attention Module (WDAM) to generate realistic synthetic anomalies and enhance feature extraction, significantly improving industrial anomaly detection performance on benchmark datasets without requiring fine-tuning.

Wensheng Wu, Zheming Lu, Ziqian Lu, Zewei He, Xuecheng Sun, Zhao Wang, Jungong Han, Yunlong Yu

Published 2026-03-04
📖 4 min read☕ Coffee break read

Imagine you are a quality control inspector at a massive factory that makes everything from toothbrushes to circuit boards. Your job is to spot the one defective item on a conveyor belt full of thousands of perfect ones.

The problem? Defects are rare. You might only see a broken screw once a month. Because you don't have enough "bad" examples to study, your brain (or your computer program) doesn't know what a broken screw looks like until it's too late.

This paper proposes a clever two-part solution to fix this: 1. A "Magic Imagination Machine" to create fake defects for training, and 2. A "Super-Sharp Lens" to help the computer see the tiny flaws better.

Here is how it works, broken down into simple analogies:

Part 1: The "Magic Imagination Machine" (FMAS)

The Problem: Usually, to teach a computer to spot a defect, you need to show it thousands of pictures of broken things. But in a factory, broken things are rare. If you try to make fake broken things by just cutting and pasting pieces of paper (old methods), they look obvious and fake, like a bad Photoshop job.

The Solution: The authors built a pipeline using Foundation Models (the same super-smart AI brains behind tools like ChatGPT and image generators). Think of this as a team of three experts working together:

  1. The Architect (GPT-4): It reads the picture of a perfect object (like a bottle) and writes a creative story: "Imagine a crack in the glass here, or a dent in the metal there." It knows exactly what a "broken bottle" sounds like in words.
  2. The Sculptor (SAM): It looks at the picture and says, "Okay, I see the bottle. I will draw a box around just the bottle so we don't accidentally break the table behind it."
  3. The Painter (Stable Diffusion): It takes the Architect's story and the Sculptor's box, and paints a realistic crack or dent right onto the bottle.

The Result: They can generate thousands of hyper-realistic fake defects without ever needing to train the AI on real broken items. It's like practicing your driving skills in a perfect video game simulator before hitting the real road.

Part 2: The "Super-Sharp Lens" (WDAM)

The Problem: Even with great training data, computers sometimes miss tiny defects. They look at an image as a whole picture, like looking at a forest and missing a single broken branch. They get distracted by the overall shape or color.

The Solution: The authors realized that defects often look different depending on how you "zoom in" on the details. They used a mathematical tool called Wavelet Transform, which is like taking a photo and separating it into four different "layers" of detail:

  • The Smooth Layer (LL): The big, blurry shapes (like the overall color of the bottle).
  • The Edge Layers (LH, HL, HH): The sharp lines, textures, and tiny cracks.

The "Lens" (Attention Module):
Imagine you are looking at a painting. A normal computer looks at the whole canvas equally. This new module, WDAM, acts like a smart spotlight.

  • It looks at the "Smooth Layer" and says, "This looks fine, ignore it."
  • It looks at the "Edge Layers" and says, *"Wait! There's a weird texture here! Turn up the brightness on this part!"*

It dynamically decides which "layer" of detail is most important for finding a defect. If a defect is a scratch, it focuses on the high-frequency edge layers. If it's a stain, it focuses on the texture layers. It amplifies the signal of the defect and mutes the noise of the background.

Putting It All Together

  1. Training: They use the Magic Imagination Machine to create a massive library of fake, realistic broken items. The computer learns what defects look like without needing real broken items.
  2. Inspection: When the computer inspects a real product, it uses the Super-Sharp Lens to ignore the boring background and zoom in specifically on the tiny, weird patterns that signal a defect.

Why This Matters

  • No Fine-Tuning: You don't need to retrain the whole AI for every new product. It just works.
  • Plug-and-Play: The "Super-Sharp Lens" (WDAM) can be added to almost any existing computer vision system, like adding a turbocharger to a car.
  • Better Results: In tests, this combination found defects much more accurately than previous methods, especially on tricky items like screws or fabric.

In short: They taught the computer to imagine its own training data so it knows what to look for, and then gave it a special pair of glasses that highlights the tiny cracks while ignoring the rest of the world.