Multi-illuminant Color Constancy via Multi-scale Illuminant Estimation and Fusion

This paper proposes a multi-scale deep learning framework that utilizes a tri-branch convolutional network and an attentional fusion module to represent illuminant maps as linear combinations of multi-grained components, thereby achieving state-of-the-art performance in multi-illuminant color constancy by effectively addressing the impact of image scales.

Hang Luo, Rongwei Li, Jinxing Liang

Published 2026-03-02
📖 4 min read☕ Coffee break read

🌅 The Problem: The "Bad Lighting" Camera

Imagine you are taking a photo of a beautiful red apple.

  • If you take the photo under a yellow streetlamp, the apple looks orange.
  • If you take it under a blue twilight sky, the apple looks purple.

Your human brain is amazing. Even if the light changes, your brain automatically knows, "That's still a red apple," and corrects the color in your mind. This is called Color Constancy.

However, cameras are not that smart. They just record the light hitting the sensor. If the light is weird, the whole photo looks weird (too yellow, too blue, or too green).

🏙️ The Complication: A Room with Many Lights

Most previous computer programs tried to fix this by assuming the whole room has one single light source (like one big ceiling bulb). They would calculate the color of that one bulb and fix the whole image.

But real life is messy! Imagine a room with:

  • A warm yellow lamp in the corner.
  • A cool blue window on the left.
  • A bright white spotlight in the center.

If you try to fix the whole room with just one color correction, the apple near the window will look wrong, and the apple near the lamp will look wrong. This is the Multi-Illuminant problem. The camera needs to know that different parts of the photo need different fixes.

🔍 The Old Way vs. The New Way

The Old Way (Deep Learning):
Previous AI methods tried to look at the whole picture and guess the lighting for every single pixel at once. It's like trying to paint a detailed landscape by looking at it from 100 miles away through a telescope. You might get the big shapes right, but you miss the tiny details.

The New Way (This Paper's Solution):
The authors (Hang Luo, Rongwei Li, and Jinxing Liang) realized that scale matters.

  • If you zoom out (small scale), you see the big picture: "Okay, the left side is generally blue, and the right side is generally yellow."
  • If you zoom in (large scale), you see the tiny details: "Ah, this specific leaf is catching a tiny reflection of a red car."

They proposed that the perfect lighting map is a mixture of these different views.

🛠️ The Solution: The "Three-Chef Kitchen"

The authors built a system with three parallel chefs (a "Tri-branch network") working in a kitchen to fix the photo.

  1. Chef Small (The Coarse Chef): Looks at a tiny, blurry version of the photo. They are good at seeing the big trends. "The whole left side is dark and blue."
  2. Chef Medium: Looks at a medium-sized version. They see the structure. "The shadow under the table is green."
  3. Chef Large (The Fine Chef): Looks at the full, high-definition photo. They see the tiny details. "That specific pixel on the apple is reflecting a red light."

🤝 The "Smart Manager" (Attentional Fusion)

If you just asked the three chefs to mix their ideas together, it might be a mess. You need a Smart Manager (called the Attentional Illuminant Fusion Module).

  • The Manager looks at every single pixel in the photo.
  • For a pixel in a blurry, big area, the Manager says, "I trust Chef Small more here."
  • For a pixel on a sharp edge, the Manager says, "I trust Chef Large more here."
  • The Manager creates a "weight map" (a voting system) to decide exactly how much to listen to each chef for every single pixel.

🧪 The Results

They tested this on a massive dataset of photos with mixed lighting.

  • The Score: They measured how close the corrected colors were to the "true" colors (using something called "Mean Angular Error").
  • The Win: Their method beat all the other top methods. It was like getting a 98% on a test where everyone else got 90%.
  • Visual Proof: When they fixed the photos, the colors looked natural and realistic, whereas other methods left some parts looking weirdly tinted.

🚀 In a Nutshell

Instead of trying to guess the lighting for a whole complex scene with one giant guess, this paper says: "Let's look at the scene from three different distances, let three different AI models guess the lighting, and then let a smart manager blend their best guesses together for every single pixel."

This approach allows computers to see the world with the same color-perfect vision that humans have, even in rooms with a dozen different light bulbs.