EAGLE: Expert-Augmented Attention Guidance for Tuning-Free Industrial Anomaly Detection in Multimodal Large Language Models

The paper proposes EAGLE, a tuning-free framework that leverages expert model outputs to guide Multimodal Large Language Models toward accurate and interpretable industrial anomaly detection without requiring parameter updates, achieving performance comparable to fine-tuned methods.

Xiaomeng Peng, Xilang Huang, Seon Han Choi

Published 2026-02-25
📖 5 min read🧠 Deep dive

The Big Problem: The "Smart" Inspector Who Can't See the Forest for the Trees

Imagine a factory that makes thousands of products every day. They need to find the one defective item (like a scratch on a phone screen or a tear in a fabric).

  • Old Way (Deep Learning): They used to hire a robot that was incredibly good at spotting the scratch. It would say, "Defect found!" or "No defect." But if you asked it, "Where is it?" or "What kind of scratch is it?", it would just stare blankly. It was a genius at finding problems but terrible at explaining them.
  • New Way (Multimodal Large Language Models - MLLMs): Then, they tried hiring a "Super-Intellect" (like a very smart AI chatbot that can see images). This AI is great at talking. It can say, "I see a deep scratch on the left side of the zipper, likely caused by a metal tool."
    • The Catch: This Super-Intellect is a bit of a daydreamer. It often trusts what it thinks it should see based on its reading habits, rather than what is actually in the picture. It might look at a perfect shirt and say, "I see a stain," because it's used to reading about stains. It also needs a lot of expensive training to learn the specific job, which takes too much time and money.

The Solution: EAGLE (The Expert Guide)

The authors of this paper created EAGLE. Think of EAGLE as a tactical partnership between the "Super-Intellect" (the AI chatbot) and a "Veteran Factory Inspector" (a specialized, simple AI).

The goal? To get the chatbot to give perfect answers without retraining it or teaching it new things.

Here is how EAGLE works, step-by-step:

1. The Veteran Inspector (The Expert Model)

First, they bring in a "Veteran Inspector." This is a specialized AI (based on something called PatchCore) that is only trained to spot defects. It doesn't talk; it just points.

  • The Problem with the Veteran: Sometimes, the Veteran gets a little paranoid. It might point at a normal shirt and say, "Hey, look at this tiny speck! It's suspicious!" If you show this to the Super-Intellect, the chatbot might get confused and think, "Oh, the Veteran says it's broken, so I must say it's broken," even if it's actually fine.

2. The "Smart Filter" (Distribution-Based Thresholding - DBT)

To stop the chatbot from getting confused by the Veteran's paranoia, EAGLE adds a Smart Filter.

  • How it works: The Smart Filter looks at how the Veteran behaves on perfectly good items. It learns, "Okay, the Veteran usually points at things with a score of 1 or 2. If the score is 10, that's a real problem. If it's 3, it's probably just noise."
  • The Result: The filter only lets the Veteran's "pointing" (visual hints) through to the chatbot if the problem is real and serious. If the Veteran is just being paranoid about a normal item, the filter blocks the hint. This stops the chatbot from making false alarms.

3. The "Confidence Boost" (Confidence-Aware Attention Sharpening - CAAS)

Sometimes, even the Veteran isn't 100% sure. Maybe the defect is very subtle.

  • The Problem: When the Veteran is unsure, it might give a confusing hint like, "I think this is normal." The Super-Intellect (chatbot) is very stubborn and loves to listen to words. If the Veteran says "Normal," the chatbot might ignore the visual evidence of the scratch and just say "Normal."
  • The Fix: EAGLE has a special switch called CAAS. When the Veteran is unsure (the score is in a "gray area"), EAGLE tells the chatbot: "Hey, don't just listen to the words! Look harder at the picture!"
  • The Analogy: Imagine you are taking a test. Your friend whispers, "I think the answer is B." But you look at the question and see the answer is clearly A. If you are confident, you ignore your friend. But if you are unsure, you might panic and listen to your friend. EAGLE forces the chatbot to squint harder at the picture (visual evidence) whenever it feels unsure, overriding the confusing text hints.

Why is this a Big Deal?

  1. No Training Required: Usually, to make a Super-Intellect good at a specific job, you have to spend weeks teaching it (Fine-tuning). EAGLE is Tuning-Free. It's like giving the chatbot a pair of glasses and a cheat sheet, rather than sending it to school for a new degree.
  2. It Actually Works: The paper tested this on real factory datasets (MVTec-AD and VisA). The results showed that EAGLE made the chatbot almost as good as the specialized "Veteran Inspector" at finding defects, but with the added superpower of being able to explain the defect in human language.
  3. It Fixes the "Daydreamer" Issue: By analyzing how the chatbot's brain works (its "attention"), the authors found that when the chatbot gets the answer right, it is actually looking at the defect. EAGLE just helps it keep its eyes on the prize.

Summary in a Nutshell

EAGLE is like hiring a Senior Expert to guide a Genius Intern.

  • The Expert spots the problem.
  • A Filter ensures the Expert only speaks up when they are sure, preventing false alarms.
  • A Focus Mechanism tells the Intern to trust their eyes over the Expert's words when things get tricky.

The result? A factory inspection system that is fast, accurate, doesn't need expensive training, and can explain exactly what went wrong in plain English.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →