ACD-U: Asymmetric co-teaching with machine unlearning for robust learning with noisy labels

The paper proposes ACD-U, an asymmetric co-teaching framework that combines a CLIP-pretrained Vision Transformer with a CNN and incorporates machine unlearning to actively correct selection errors and achieve state-of-the-art robustness against noisy labels.

Reo Fukunaga, Soh Yoshida, Mitsuji Muneyasu

Published Tue, 10 Ma
📖 5 min read🧠 Deep dive

Imagine you are training a team of two detectives to solve a mystery (classify images). However, the case files they are given contain a lot of fake clues (noisy labels). Some files say "This is a cat" when it's actually a dog.

If you let the detectives read these files blindly, they will eventually start believing the fake clues, memorizing them as facts. This ruins their ability to solve real cases later. This is the core problem the paper addresses: How do you train smart AI when the data is full of lies?

The authors propose a new system called ACD-U. Think of it as a clever training camp with two special rules: The "Different Personalities" Rule and the "Memory Eraser" Rule.

1. The "Different Personalities" Rule (Asymmetric Co-Teaching)

Most previous methods used two identical detectives (two identical AI models) to check each other's work. If both detectives agreed on a fake clue, they would both get tricked, and the error would stick forever.

ACD-U changes the team dynamic. Instead of two identical detectives, they hire two very different ones:

  • Detective V (The Veteran): This is a Vision Transformer (a type of AI) that has already read millions of books and seen the world before. It's like a seasoned expert who knows what a "cat" looks like immediately. Because it's so experienced, it's very confident and rarely gets confused by bad clues early on.
  • Detective A (The Apprentice): This is a CNN (a standard AI) starting from scratch. It's eager to learn but gets confused easily. It needs to be taught carefully.

How they work together:

  • The Veteran (V) only looks at the clues it is 100% sure are real. It refuses to touch the messy, confusing files. It acts as a stable anchor, teaching the Apprentice what true looks like.
  • The Apprentice (A) is allowed to look at everything, including the messy files, but it uses a special "semi-supervised" technique to guess the truth.
  • The Magic: Because they are so different, they rarely make the same mistake at the same time. If the Apprentice gets confused by a fake clue, the Veteran usually spots it and says, "No, that's wrong." This stops them from reinforcing each other's errors.

2. The "Memory Eraser" Rule (Machine Unlearning)

Here is the paper's biggest breakthrough. Even with two great detectives, sometimes they still accidentally memorize a fake clue. In the past, once a detective memorized a lie, there was no way to fix it. The lie became part of their permanent memory.

ACD-U introduces a "Memory Eraser" (Machine Unlearning).

  • The Detective's Diary: The system keeps a diary of what the detectives thought at the start of the day.
  • The "Oops" Moment: Later in the training, the system looks at the clues. If a clue that used to be confusing suddenly becomes "easy" (low loss) but contradicts what the Veteran (who uses a pre-trained "CLIP" model to check facts) thinks, the system knows: "Wait, we just memorized a lie!"
  • The Eraser: Instead of ignoring the mistake, the system actively erases the influence of that specific clue from the detective's brain. It uses a mathematical "force" to push the detective's memory away from that fake clue, effectively saying, "Forget that you ever saw this."

This turns the process from passive (trying not to make mistakes) to active (finding mistakes and fixing them after they happen).

The Analogy: The Classroom

Imagine a classroom with a Professor (The Veteran ViT) and a Student (The Apprentice CNN).

  1. The Problem: The textbook has typos. The Student reads the typos and learns them.
  2. Old Method: Two students sit together. If they both read the typo, they convince each other it's correct.
  3. ACD-U Method:
    • The Professor has read the correct version of the book before class. He only teaches the Student the pages he knows are right.
    • The Student tries to learn from the whole book but listens to the Professor to correct himself.
    • The Twist: If the Student accidentally memorizes a typo, the Professor notices the Student is acting strangely compared to his own "perfect memory." The Professor then uses a special technique to make the Student unlearn that specific typo, wiping it from his short-term memory so he can learn the truth instead.

Why This Matters

  • It fixes the unfixable: Previous AI methods could only try to avoid mistakes. This method can find and fix mistakes even after they happen.
  • It works in chaos: It performs incredibly well even when the data is 80% or 90% wrong (high noise), which is a nightmare for other AI models.
  • It's efficient: By using a pre-trained "Professor" (CLIP) and a learning "Student," they cover each other's weaknesses.

In short: ACD-U is like a smart teacher who not only picks the best study materials but also has a magical eraser to wipe out any wrong facts the student accidentally memorized, ensuring the student learns the truth no matter how messy the source material is.