Adversarial Patch Generation for Visual-Infrared Dense Prediction Tasks via Joint Position-Color Optimization

This paper proposes AP-PCO, a joint position-color optimization framework that generates cross-spectral adversarial patches to effectively attack visual-infrared dense prediction systems by simultaneously perturbing both modalities while maintaining stealth through color adaptation.

He Li, Wenyue He, Weihang Kong, Xingchen Zhang

Published 2026-03-03
📖 4 min read☕ Coffee break read

Imagine you have a super-smart security guard who can see in two ways at once: with normal eyes (seeing colors and textures) and with night-vision goggles (seeing heat and shapes in the dark). This guard is used to make important decisions, like counting people in a crowd, finding lost items, or combining two pictures into one perfect image.

This paper is about how to trick this "super-guard" using a single, cleverly designed sticker.

The Problem: The "One-Size-Fits-None" Sticker

Previously, hackers knew how to make a sticker that would confuse a guard with just normal eyes. They'd put a bright, weirdly colored patch on a person's shirt, and the guard would think, "That's not a person; that's a giant dog!"

But this new "super-guard" sees both normal vision and infrared (heat) vision.

  • The Issue: If you make a sticker that looks like a chaotic rainbow to the normal eye, it might look like a weird, glowing blob to the night-vision eye.
  • The Result: The normal eye gets confused, but the night-vision eye says, "Wait, that doesn't look right either," and the guard ignores the trick. The attack fails because the two "eyes" don't agree on what the sticker is.

The Solution: The "Chameleon Sticker" (AP-PCO)

The authors of this paper invented a new way to make these stickers. They call it AP-PCO. Think of it as a Chameleon Sticker that changes its personality depending on who is looking at it, all while being the same physical object.

Here is how they did it, using simple analogies:

1. The "Evolutionary Search" (Finding the Perfect Spot)

Instead of guessing where to put the sticker, the computer acts like a biologist watching a colony of ants.

  • It releases thousands of "virtual ants," each carrying a slightly different sticker idea (different sizes, different spots on the image).
  • It tests them all against the security guard.
  • The ones that confuse the guard the most get to "reproduce" (their ideas are mixed and mutated).
  • Over time, the colony evolves to find the perfect spot where the sticker causes the maximum confusion. It's like natural selection, but for stickers.

2. The "Dual-Personality" Color (The Magic Trick)

This is the coolest part. The sticker needs to look different to the two eyes, but it's only one physical piece of paper.

  • To the Normal Eye: The sticker is bright, high-contrast, and colorful. It screams "Look at me!" to mess up the texture recognition.
  • To the Night-Vision Eye: The computer takes those same colors and turns them into grayscale (black and white) and dims them down.
  • The Analogy: Imagine a sticker that looks like a neon sign to a human, but to a thermal camera, it looks like a subtle shadow that blends perfectly into the background. The computer figures out exactly which colors create this "double vision" effect.

3. The "Black Box" Rule

The researchers didn't need to know how the security guard's brain worked inside. They didn't need the blueprints. They just threw stickers at the guard, saw if the guard got confused, and adjusted the stickers based on the result. This makes the attack very hard to stop because you don't need inside information to pull it off.

Why Does This Matter?

The researchers tested this on three real-world jobs:

  1. Crowd Counting: Making the system think there are 100 people when there are only 10.
  2. Semantic Segmentation: Making the system think a person is a tree or a car.
  3. Image Fusion: Blending two pictures together poorly so the final image is useless.

The Results:

  • Their "Chameleon Sticker" worked much better than old methods.
  • It worked on different types of security guards (different AI models).
  • It was hard to spot (stealthy).
  • Even if you tried to blur the image or compress it (like standard security defenses), the sticker still worked.

The Takeaway

This paper is a wake-up call. It shows that as we build smarter systems that combine different types of sensors (like cameras and heat sensors), we need to be careful. A single, cleverly designed physical object can trick these advanced systems just as easily as it tricks a human, perhaps even more so.

The authors aren't trying to break the world; they are holding up a mirror to show us where the cracks are, so we can build stronger, safer AI guards for the future.