Image Compression Using Novel View Synthesis Priors

This paper proposes a model-based image compression technique for tetherless underwater remotely operated vehicles that leverages novel view synthesis priors and gradient descent optimization to achieve superior compression ratios and image quality, particularly in scenarios involving new objects within the scene.

Luyuan Peng, Mandar Chitre, Hari Vishnu, Yuen Min Too, Bharath Kalyan, Rajat Mishra, Soo Pieng Tan

Published Wed, 11 Ma
📖 5 min read🧠 Deep dive

Imagine you are trying to talk to a friend who is deep underwater in a submarine, but the only way to communicate is through a very slow, crackly walkie-talkie. You want to send them a photo of a shipwreck you just found, but the "walkie-talkie" (acoustic communication) is so slow that sending a full photo would take forever. By the time the photo arrives, you've already moved on to the next spot.

This is the exact problem the researchers in this paper are trying to solve for underwater robots (ROVs).

Here is a simple breakdown of their solution, NVSPrior, using some everyday analogies.

The Problem: The "Slow Walkie-Talkie"

Underwater, you can't use Wi-Fi or radio waves (they don't travel well through water). You have to use sound waves (acoustics), which are like a very narrow pipe. You can send simple text commands easily, but trying to send a high-quality video or photo is like trying to pour a swimming pool of water through that narrow pipe. It just takes too long.

The Old Way: Sending the Whole Picture

Traditionally, if the robot wanted to send a photo, it would take the picture, squish it as small as possible (like zipping a file), and send it. But even the best "zippers" (like JPEG or WebP) still leave too much data for these slow underwater pipes. The robot would send maybe 1 or 2 pictures a second, which is too slow for a human to control the robot effectively.

The New Idea: The "Mental Map" Trick

The researchers came up with a clever trick. They realized that underwater inspection sites (like a shipwreck or an oil rig) don't change much from day to day. The rocks, the rust, and the structure are always there.

Instead of sending the whole picture every time, they decided to send a mental map and only the changes.

Here is how it works, step-by-step:

1. The "Training Camp" (Creating the Prior)

Before the robot goes on its real mission, it does a "training run." It swims around the site and takes hundreds of photos.

  • The Analogy: Imagine you are an artist who wants to draw a specific house. You spend a week studying the house from every angle. You memorize exactly where the windows, the door, and the chimney are. You create a perfect 3D mental model of the house in your head.
  • In the paper: This mental model is called a NVS (Novel View Synthesis) model. It's a digital 3D map of the underwater site stored on both the robot and the human operator's computer.

2. The "Guessing Game" (During the Mission)

Now, the robot is on its real mission. It takes a new photo.

  • The Old Way: Send the whole photo.
  • The New Way: The robot looks at its 3D mental map and asks, "If I am standing here looking this way, what should the house look like?"
  • It uses its math to render (draw) a fake picture of what the scene should look like based on its location.

3. The "Difference" (The Secret Sauce)

The robot compares the Real Photo it just took with the Fake Photo it just drew from its mental map.

  • The Analogy: Imagine you are playing a game of "Spot the Difference."
    • If the scene hasn't changed, the Real Photo and the Fake Photo are identical. The "difference" is zero. You send nothing!
    • If there is a fish swimming by, or a new piece of trash, or a slight change in lighting, the two photos won't match perfectly.
    • The robot only calculates the tiny differences (the fish, the trash, the lighting shift).
  • In the paper: This is called the residual or difference image. Because most of the scene is already known (from the mental map), this "difference" file is tiny. It's like sending a note that says, "The house is the same, but there's a blue fish in the corner."

4. The "Refinement" (iNVS)

Sometimes, the robot isn't 100% sure of its exact location. If it guesses its location wrong by even a tiny bit, the "Fake Photo" will be slightly shifted, and the "Difference" will look like a messy blur (which is hard to compress).

  • The Solution: The paper introduces a smart algorithm called iNVS. It's like a super-fast auto-correct.
  • The Analogy: Imagine you are trying to align two transparent sheets of paper. If they are slightly off, the image looks blurry. The iNVS algorithm nudges the "Fake Photo" sheet back and forth until it lines up perfectly with the Real Photo.
  • Once they are perfectly aligned, the "Difference" is just the actual new objects (the fish), making the file size incredibly small.

Why is this a Big Deal?

The researchers tested this in a giant water tank and on real underwater datasets (like a coral reef and a sunken torpedo boat).

  • The Result: Their method sent data 2 to 4 times smaller than the best standard methods (like WebP or JPEG).
  • The Benefit: Instead of getting 2 frames per second, the operator could get 10 frames per second. This makes the robot feel much more responsive, allowing for real-time control and inspection.
  • Robustness: Even when new things appeared in the scene (like a new metal structure or a fish), the system handled it well because it only had to send the new stuff, not the whole background.

Summary

Think of it like sending a text message instead of a photo.

  • Old Way: "Here is a photo of the ocean floor." (Huge file, slow to send).
  • New Way: "The ocean floor looks exactly like the map we made yesterday, except there is a crab in the bottom left corner." (Tiny file, instant to send).

By using a shared "memory" of the underwater world, this technique allows robots to send high-quality video back to humans even through the slowest, narrowest underwater connections.