DeformTrace: A Deformable State Space Model with Relay Tokens for Temporal Forgery Localization

This paper proposes DeformTrace, a hybrid architecture combining State Space Models with deformable dynamics and relay tokens to achieve state-of-the-art temporal forgery localization by addressing challenges in boundary ambiguity, sparse forgeries, and long-range modeling.

Xiaodong Zhu, Suting Wang, Yuanming Zheng, Junqi Yang, Yangxu Liao, Yuhong Yang, Weiping Tu, Zhongyuan Wang

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you are a detective trying to find a specific lie in a 10-minute video of a person talking. The video looks and sounds real, but somewhere in the middle, a few seconds were swapped with a fake clip generated by AI. Your job is to pinpoint exactly when the lie starts and ends.

This is the challenge of Temporal Forgery Localization (TFL).

The paper introduces a new detective tool called DeformTrace. To understand how it works, let's break down the problems it solves and the clever tricks it uses, using some everyday analogies.

The Three Big Problems with Old Detectives

Previous AI models trying to solve this had three main weaknesses:

  1. The "Blurry Line" Problem: Old models were like a camera with a bad focus. They could tell something was wrong, but they couldn't tell you exactly where the fake part started or stopped. The boundaries were fuzzy.
  2. The "Needle in a Haystack" Problem: Most of the video is real. The fake part is tiny (maybe just 2 seconds out of 100). Old models got so distracted by all the "real" stuff that they missed the tiny "fake" needle.
  3. The "Long Memory" Problem: If the video is long, the AI starts to forget what happened at the beginning by the time it reaches the end. It's like trying to remember a story told 10 minutes ago while listening to a new story; the details fade away.

Enter DeformTrace: The Super-Detective

The authors built DeformTrace using a new type of AI engine called a State Space Model (SSM). Think of an SSM as a super-efficient note-taker that can read a whole book without getting tired, unlike older models (Transformers) that get overwhelmed by long texts.

But DeformTrace adds three special "superpowers" to this note-taker:

1. The "Shape-Shifting Lens" (Deformable Self-SSM)

The Analogy: Imagine you are looking at a map with a magnifying glass. A normal magnifying glass has a fixed size; you can only see a small circle. If the clue you are looking for is slightly outside that circle, you miss it.
How DeformTrace works: Instead of a fixed circle, DeformTrace has a shape-shifting lens. If the AI senses a suspicious area, the lens stretches and bends to focus exactly on that spot, ignoring the irrelevant parts. This allows it to find the "blurry lines" of the forgery with surgical precision.

2. The "Relay Runners" (Relay Token Mechanism)

The Analogy: Imagine a long line of people passing a message down a chain. If the line is too long, the person at the end might hear a garbled version of the message because the signal got weak.
How DeformTrace works: To fix this, the AI inserts special "Relay Runners" (tokens) every few steps. These runners act like signal boosters. They grab the message from the first half of the video, summarize it, and shout it clearly to the second half. This ensures the AI remembers the beginning of the video perfectly, even when it's analyzing the end.

3. The "Specialized Search Team" (Deformable Cross-SSM)

The Analogy: Imagine you are looking for a specific type of fish in a huge ocean. Instead of scanning the whole ocean blindly, you send out a team of divers, each holding a picture of the fish they are looking for. They only dive where the water looks like it might have that fish.
How DeformTrace works: The AI creates "Query Tokens" (the divers) that represent potential fake spots. These tokens ignore all the boring, real parts of the video and only "tune in" to the parts that look suspicious. This prevents the AI from getting confused by the massive amount of "real" data and keeps its focus sharp on the "fake" needle.

The Result: Faster, Smarter, and Stronger

By combining these three tricks, DeformTrace achieves something amazing:

  • It's Precise: It can tell you the fake clip starts at 4:12 and ends at 4:18, not just "somewhere in the middle."
  • It's Fast: Because it uses the efficient "note-taker" (SSM) engine, it processes video much faster than older, heavier models. It's like switching from a heavy truck to a sleek sports car.
  • It's Robust: Even if the video is compressed, blurry, or has bad audio (like a shaky phone call), DeformTrace still finds the lie. It's like a detective who can solve a case even if the evidence is smudged.

In a Nutshell

DeformTrace is a new AI system that acts like a highly skilled detective. It uses a shape-shifting lens to find exact boundaries, relay runners to remember the whole story, and specialized search teams to ignore the noise and find the tiny lies. The result is a tool that is faster, cheaper to run, and much better at catching deepfakes than anything we've had before.