TP-Spikformer: Token Pruned Spiking Transformer

The paper proposes TP-Spikformer, a training-free token pruning framework for spiking transformers that utilizes a heuristic spatiotemporal criterion and block-level early stopping to significantly reduce computational and storage overhead while maintaining competitive performance across diverse architectures and tasks.

Wenjie Wei, Xiaolong Zhou, Malu Zhang, Ammar Belatreche, Qian Sun, Yimeng Shan, Dehao Zhang, Zijian Zhou, Zeyu Ma, Yang Yang, Haizhou Li

Published 2026-03-03
📖 5 min read🧠 Deep dive

Imagine you have a very smart, energy-efficient robot brain (called a Spiking Neural Network or SNN) that is trying to learn how to recognize things in a video. Unlike a human brain that fires electricity constantly, this robot brain only "fires" a tiny spark when it sees something important. This makes it incredibly fast and low-power, perfect for running on small devices like drones or smartwatches.

However, there's a problem. To get really good at recognizing things (like spotting a cat in a crowd), we've made these robot brains huge and complicated. They are like a library with millions of books, but when the robot tries to read a story, it feels like it has to read every single page of every single book before it can tell you the ending. This takes too much time and battery power.

Enter the authors of this paper with TP-Spikformer. Think of it as a super-smart editor for the robot's brain.

The Problem: Reading Every Page

Imagine you are watching a movie. If you had to read the script for the whole movie to understand the plot, you'd be reading a lot of boring scenes where nothing happens (like a shot of an empty sky or a quiet hallway).

  • The Old Way: The robot brain reads every single "token" (a tiny piece of the image or video frame), even the boring ones. It wastes energy processing the empty sky just to get to the part where the cat jumps.
  • The Result: The robot is accurate but slow and hungry for battery.

The Solution: The "Smart Editor" (TP-Spikformer)

The authors created a method to teach the robot brain to skip the boring parts without losing the story. They call this Token Pruning.

Here is how their "Smart Editor" works, using two simple rules:

1. The "Spot the Difference" Rule (Spatial Intelligence)

Imagine you are looking at a photo of a dog in a park.

  • The grass is all the same green.
  • The sky is all the same blue.
  • But the dog has fur, ears, and a tail that look very different from the grass.
  • The Editor's Move: The editor looks at the photo and says, "Hey, this patch of grass is just like the grass next to it. It's boring. Let's skip reading it." But it says, "This patch with the dog's ear? That's unique! Let's keep reading that."
  • In the Paper: They call this the Spatial Scorer. It finds the "interesting" parts of the image that look different from their neighbors.

2. The "What Changed?" Rule (Temporal Intelligence)

Now, imagine the video starts. The dog is sitting still, then suddenly it barks and jumps.

  • The grass didn't move. The sky didn't move.
  • But the dog's mouth moved, and its position changed.
  • The Editor's Move: The editor looks at the video frame-by-frame. It says, "The grass looked the same as the last second. Skip it." But, "The dog's mouth just opened! That's a big change! Keep reading that!"
  • In the Paper: This is the Temporal Scorer. It spots things that are moving or changing over time.

The Magic Trick: "Early Stopping" (Not Deleting, Just Ignoring)

Here is the clever part. Usually, when you delete boring parts of a document, you might mess up the formatting, making it hard to read later.

  • Old Methods: Some previous methods tried to physically cut out the boring words. This often broke the structure of the sentence, requiring the robot to be retrained from scratch to learn how to read the new, broken sentences.
  • TP-Spikformer's Method: Instead of cutting the words out, the editor tells the robot: "Don't waste energy reading this boring paragraph, but keep it in the book so the page numbers stay the same."
  • The Analogy: Imagine a teacher telling a student, "You don't need to solve these 50 easy math problems to get the answer, but keep the paper in front of you so the next teacher knows where to look."
  • The Benefit: The robot saves energy by skipping the work, but the "structure" of the brain stays perfect. This means we can use this editor on any existing robot brain without having to retrain it from scratch!

Why Does This Matter?

The authors tested this "Smart Editor" on many different tasks:

  • Recognizing images (Is that a cat or a dog?)
  • Finding objects (Where is the car in this traffic jam?)
  • Tracking movement (Follow that bird as it flies through the trees).

The Results:

  • Speed: The robot became 1.4x to 2x faster.
  • Battery: It used significantly less power (up to 40% less energy).
  • Accuracy: It barely lost any accuracy (sometimes even got slightly better because it focused only on the important stuff!).
  • No Re-training: You can take an existing, pre-trained robot brain and just apply this editor. No need to spend weeks teaching it again.

The Big Picture

Think of TP-Spikformer as a personal assistant for your robot brain. It looks at the massive amount of data the robot is about to process, says, "Hey, you don't need to look at all of this. Just focus on the dog, the car, and the moving bird. Ignore the sky and the grass."

This allows us to put powerful, smart AI into small, battery-powered devices (like your glasses, your phone, or a drone) without them running out of juice or getting slow. It's a simple, smart way to make AI more efficient, just like how our own brains naturally ignore the background noise to focus on what matters.