TherA: Thermal-Aware Visual-Language Prompting for Controllable RGB-to-Thermal Infrared Translation

TherA is a novel framework that addresses the scarcity of thermal infrared data by combining a thermal-aware visual-language model with a latent diffusion translator to generate diverse, physically plausible, and controllable RGB-to-TIR images based on user prompts.

Dong-Guw Lee, Tai Hyoung Rhee, Hyunsoo Jang, Young-Sik Shin, Ukcheol Shin, Ayoung Kim

Published 2026-02-26
📖 5 min read🧠 Deep dive

Imagine you have a magical camera that can see the world in two different ways:

  1. The "Daylight" View (RGB): This is what your eyes see. It shows colors, shapes, and textures.
  2. The "Heat" View (Thermal): This is what a thermal camera sees. It doesn't see colors; it sees temperature. Hot things glow bright white, and cold things look dark blue or black.

For a long time, scientists have wanted to teach computers to translate the "Daylight" view into the "Heat" view automatically. This is super useful for things like self-driving cars (which need to see pedestrians at night) or search-and-rescue missions.

However, there's a big problem: Heat is tricky.

The Problem: The "Copy-Paste" Mistake

Imagine you have a photo of a car.

  • Scenario A: The car is driving down the highway. The engine is hot, the tires are hot, and the exhaust is steaming.
  • Scenario B: The same car is parked in a garage, turned off, and has been sitting there for hours.

If you ask a standard AI to turn the photo of the driving car into a thermal image, it might do a decent job. But if you ask it to turn the photo of the parked car into a thermal image, old AI models often make a silly mistake. They might still draw the car as "hot" because they just learned that "cars look like cars." They forget the physics: A parked car is cold.

Old AI models treat this like a simple art filter (like turning a photo black and white). But turning a photo into a thermal image isn't just about style; it's about physics. You need to know if the engine is running, if it's raining, or if the sun just set.

The Solution: TherA (The "Thermal Physicist" AI)

The authors of this paper created a new system called TherA. Think of TherA not just as a translator, but as a Thermal Physicist who helps a painter.

Here is how it works, using a simple analogy:

1. The Translator (The Painter)

This is the part of the AI that actually draws the thermal image. It's like a talented painter who knows how to use heat colors.

2. The Therapist (TherA-VLM)

This is the new, special part. Before the painter touches the canvas, they ask the Therapist (a specialized AI brain) for instructions.

  • The Old Way: The painter just looked at the photo and guessed. "Oh, it's a car, so I'll make it hot."
  • The TherA Way: The painter asks the Therapist: "Hey, look at this car. Is it running? Is it raining? Is it night time?"

The Therapist analyzes the photo and gives the painter a secret recipe (a "thermal embedding").

  • Recipe for the driving car: "Make the engine and wheels glowing hot. Make the exhaust steamy."
  • Recipe for the parked car: "Make the whole car cool and dark. No steam."
  • Recipe for a rainy day: "Make everything look damp and cooler."

Why is this a Big Deal?

1. It's Controllable (The "Remote Control" Feature)
With TherA, you can change the story of the image just by typing a note or showing a reference picture.

  • Text Control: You can type "Make it a rainy night," and TherA will instantly redraw the scene so the road looks wet and the car looks cooler, even if the original photo was a sunny day.
  • Reference Control: You can show the AI a picture of a "parked car" and tell it, "Make this car look like that one." The AI will instantly cool down the car in the new image to match the reference.

2. It's Smarter About Physics
Because the Therapist understands the rules of heat (like how metal gets hot in the sun or how engines generate heat), the results look real. It doesn't just guess; it reasons. If you show it a car with no exhaust smoke and the engine off, it knows to draw it as cold.

3. It Solves the "Data Scarcity" Problem
Real thermal cameras are expensive, and taking thousands of thermal photos is hard. TherA can take millions of cheap, easy-to-get photos (like those from your phone) and turn them into realistic thermal photos. This gives scientists a massive library of "fake" thermal data to train better self-driving cars and robots.

The Bottom Line

Think of TherA as the difference between a photocopier and a chef.

  • A photocopier (old AI) just copies the colors and shapes, often getting the "temperature" wrong.
  • TherA is a chef who tastes the ingredients (the photo), understands the recipe (the physics of heat), and cooks up a brand new dish (the thermal image) that tastes exactly right, whether it's a hot summer day or a cold winter night.

This breakthrough means we can finally generate realistic thermal images on demand, helping robots and cars "see" the heat of the world much better than ever before.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →