BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation

This paper introduces BusterX, an MLLM-powered framework for AI-generated video forgery detection that combines the comprehensive GenBuster-200K dataset and the multi-track GenBuster-Bench to shift from black-box classification to interpretable visual reasoning, achieving superior accuracy and explanation quality compared to leading models.

Haiquan Wen, Yiwei He, Zhenglin Huang, Tianxiao Li, Zihan Yu, Xingru Huang, Lu Qi, Baoyuan Wu, Xiangtai Li, Guangliang Cheng

Published 2026-03-09
📖 5 min read🧠 Deep dive

Imagine the internet is a massive, bustling marketplace. For a long time, it was easy to tell the difference between a real photo taken by a human and a fake one made by a computer. But recently, "digital magicians" (AI video generators) have become so good at their tricks that they can create videos so realistic, even experts can't tell them apart from reality.

This paper introduces a new team of "Digital Detectives" called BusterX, along with a new training ground and a rulebook to help them catch these fakes.

Here is the breakdown of their mission, explained simply:

1. The Problem: The "Magic Trick" is Getting Better

Think of old AI videos like a child's drawing of a cat. You could easily spot the ears were wrong or the tail was missing. But today's AI is like a master illusionist. It can make a video of a person talking, walking, or dancing that looks 100% real.

The old "detectives" (previous AI tools) were like security guards who only knew how to spot the child's drawings. When the master illusionists showed up, the guards got confused and let the fakes through. Also, the old tools were "black boxes"—they would just say "Fake!" without explaining why, which made it hard for humans to trust them.

2. The New Training Ground: GenBuster-200K

To train better detectives, you need better practice materials. The authors realized the old practice videos were too easy (like training a police dog on a squeaky toy).

They built GenBuster-200K, a massive library of over 200,000 videos.

  • The Mix: It has real videos and super-realistic fake videos.
  • The Fairness Rule: They made sure the library wasn't biased. Just like a real city has people of all ages, genders, and backgrounds, this dataset includes everyone. They didn't just train the AI on "young men in suits"; they trained it on everyone, everywhere.
  • The "Wild" Zone: They even included videos that had been compressed by social media (like TikTok or YouTube), because real fakes usually get squished and pixelated when people share them.

3. The New Rulebook: GenBuster-Bench

Instead of just giving the detectives a final exam with one big test, the authors created a three-level challenge course:

  • Level 1 (The Classroom): Can the detective spot fakes made by the tools they've seen before? (Easy mode).
  • Level 2 (The New Villain): Can the detective spot fakes made by brand new tools they've never seen? This tests if they learned the principles of forgery or just memorized the old tricks.
  • Level 3 (The Real World): Can the detective spot a fake that has been posted on social media, compressed, and shared a hundred times? This is the hardest test.

They also added a "Judge" (another AI) that doesn't just check if the answer is right, but grades the explanation. If the detective says "It's fake because the eyes are weird," the Judge checks: "Did you actually look at the eyes, or did you just guess?"

4. The Star Detective: BusterX

Meet BusterX. Unlike the old guards who just shouted "Fake!" or "Real!", BusterX is a Reasoning Detective.

  • How it works: Instead of guessing, BusterX puts on its thinking cap and writes a step-by-step report. It looks at the video frame by frame and asks questions like:
    • "Does the shadow move with the sun?"
    • "Do the person's clothes ripple naturally when they walk?"
    • "Is the skin texture too smooth, like plastic?"
  • The Secret Weapon (Reinforcement Learning): The authors didn't just teach BusterX the answers. They used a technique called Reinforcement Learning. Imagine a dog trainer: every time BusterX finds a real clue and explains it well, it gets a "treat" (a reward). If it guesses wrong or writes a lazy explanation, it gets a gentle "no." Over time, BusterX learns to think like a human forensic expert.

5. The Results: Why This Matters

When they put BusterX to the test:

  • Old Detectors: They failed miserably on the "Real World" level, getting confused by new AI tools.
  • Big AI Models: Some giant AI models were good at guessing, but they often got it wrong because they were biased (e.g., they thought everything was fake, or nothing was fake).
  • BusterX: It won. It didn't just guess; it provided reasons. It could tell you exactly why a video was fake (e.g., "The person's hand flickers between frames"), and it stayed calm and accurate even when the video was messy or from a new AI generator.

The Big Picture

This paper is like upgrading from a metal detector that beeps at everything to a forensic scientist with a magnifying glass.

In a world where AI can create deepfakes that could ruin reputations or spread lies, we need tools that don't just say "That's a lie," but can prove it with clear, logical evidence. BusterX is that tool, ready to help us keep the truth safe in a world of digital magic.