Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments

This paper proposes a consistency-based abductive reasoning framework that integrates predictions from multiple pre-trained models at test time to mitigate performance degradation in novel environments, achieving significant improvements in accuracy and F1-score over individual models and standard ensembles by selecting a subset of predictions that maximizes coverage while minimizing logical inconsistencies.

Mario Leiva, Noel Ngu, Joshua Shay Kricheli, Aditya Taparia, Ransalu Senanayake, Paulo Shakarian, Nathaniel Bastian, John Corcoran, Gerardo Simari

Published Thu, 12 Ma
📖 5 min read🧠 Deep dive

Imagine you are leading a rescue mission in a brand-new, unfamiliar city after a disaster. You have a team of five expert scouts (these are your pre-trained AI models). Each scout has spent years training in a specific neighborhood, so they are experts at recognizing cars, people, and buildings in their home turf.

But now, you are in a strange new city with weird weather (heavy fog, snow, or dust) that none of them have seen before.

The Problem: The "Confused Scout" Dilemma

When you ask the scouts to identify objects:

  • Scout A (trained in sunny weather) sees a car and says, "That's a car!"
  • Scout B (trained in fog) looks at the same car and says, "That's a bush!"
  • Scout C is confident but wrong, saying, "That's a dog!"

If you just take a majority vote (like a democracy), you might get it wrong because the "noise" of the new environment confuses everyone. If you just trust the "best" scout, you might miss things because that scout is also confused by the new weather.

The Solution: The "Logic Detective"

This paper proposes a new way to handle this chaos. Instead of just voting, the team uses a Logic Detective (an abductive reasoning system) to figure out who is lying and who is telling the truth.

Here is how the system works, step-by-step:

1. The "Cheat Sheet" (Metacognitive Rules)

Before the mission, each scout was given a "Cheat Sheet" (a logic program). This sheet doesn't teach them what a car looks like; it teaches them when they are likely to be wrong.

  • Example Rule: "If the visibility is low (fog) AND the object looks blurry, Scout B is likely to mistake a car for a bush."
  • This is like a scout saying, "I know I'm bad at seeing things in the rain, so I'm going to be extra skeptical of my own eyes."

2. The "Consistency Check" (The Abduction)

The Logic Detective gathers all the reports. It knows two things:

  1. The Rules of Reality: Two different things can't be in the exact same spot at the same time (e.g., an object can't be both a "car" and a "bush").
  2. The Cheat Sheets: It knows which scouts are prone to specific errors in specific weather.

The Detective's job is to pick and choose the best set of reports to believe. It asks: "If I believe Scout A and ignore Scout B, does the whole story make sense? If I believe Scout C, does it break the laws of physics?"

It tries to find the largest possible group of truths (so you don't miss any cars) while ensuring no contradictions exist (so you don't have a car that is also a bush).

3. The Two Methods: The "Math Wizard" vs. The "Fast Runner"

The paper tests two ways to solve this puzzle:

  • The Math Wizard (Integer Programming): This method tries to find the perfect solution. It checks every possible combination of scouts to find the absolute best set of answers. It's like a chess grandmaster calculating every move to win. It's very accurate but takes a little longer to think.
  • The Fast Runner (Heuristic Search): This method is a bit more like a sprinter. It makes quick, smart guesses. It adds scouts one by one, checking if the story still makes sense. It's not guaranteed to be perfect, but it's incredibly fast and usually gets very close to the best answer.

4. The Tie-Breaker

Sometimes, the Logic Detective finds two equally good stories (e.g., "It's a car" vs. "It's a truck," and both fit the rules). In this case, it uses a Tie-Breaker: it simply asks, "Which scout was most confident?" and picks that answer.

The Result: Why It Matters

The researchers tested this on a simulated aerial dataset with crazy weather shifts (rain, snow, dust, leaves).

  • Old Way (Majority Vote): Got confused easily.
  • New Way (Logic Detective): Even when the weather was terrible, the system figured out which scouts were lying and which were telling the truth.
  • The Score: The new system improved accuracy by about 16% and F1-score (a measure of how good the detection is) by 13.6% compared to the best single scout.

The Big Picture Analogy

Think of this like a jury trial.

  • The Scouts are the witnesses. Some are reliable, some are unreliable, and some are confused by the fog.
  • The Cheat Sheets are the lawyers pointing out, "Witness A has bad eyesight in the rain."
  • The Logic Detective is the Judge. The Judge doesn't just count how many witnesses say "Guilty." The Judge looks at the logic: "If Witness A says it's a car, and Witness B (who is known to be confused in rain) says it's a bush, and we know it's raining... I will trust Witness A."

The goal is to get the most accurate verdict possible without letting the confusion of the new environment break the logic of the trial.

In short: This paper teaches AI how to be a better detective by using logic to filter out errors when things get weird, rather than just blindly trusting the crowd or the "smartest" individual.