VQ-Jarvis: Retrieval-Augmented Video Restoration Agent with Sharp Vision and Fast Thought

The paper introduces VQ-Jarvis, a retrieval-augmented intelligent agent for video restoration that leverages a new large-scale comparison dataset (VSR-Compare) to enhance degradation perception and employs a hierarchical scheduling strategy to efficiently determine optimal restoration trajectories for real-world heterogeneous degradations.

Xuanyu Zhang, Weiqi Li, Qunliang Xing, Jingfen Xie, Bin Chen, Junlin Li, Li Zhang, Jian Zhang, Shijie Zhao

Published 2026-03-25
📖 4 min read☕ Coffee break read

Imagine you have an old, damaged home video. Maybe it's grainy, dark, blurry, and has rain streaks on the lens. You want to fix it, but the problem is that no single tool works for everything. A tool that fixes the darkness might make the rain look worse, and a tool that removes the blur might make the colors look weird.

Traditionally, video restoration software was like a one-size-fits-all suit. It tried to fix everything at once with a single, rigid formula. If the video was complex, the suit didn't fit, and the result looked bad.

This paper introduces VQ-Jarvis, a new kind of "Video Repair Agent." Think of VQ-Jarvis not as a single tool, but as a highly skilled, super-fast video repair team leader with two superpowers: "Sharp Vision" and "Fast Thought."

Here is how it works, broken down into simple concepts:

1. The Problem: The "Blind" Repair Shop

Before VQ-Jarvis, existing AI repair tools were often "blind" to subtle differences.

  • The Old Way: Imagine a repair shop that uses a generic ruler to measure quality. It might say, "This video looks 80% good," but it can't tell the difference between a video that is slightly better than another. It often picks the wrong fix because it doesn't truly "see" the subtle flaws.
  • The Search Problem: Trying to find the perfect fix was like searching for a needle in a haystack by checking every single straw one by one. It took forever and wasted a lot of energy.

2. The Solution: VQ-Jarvis

VQ-Jarvis changes the game by acting like a smart detective who knows exactly what to do.

Superpower A: "Sharp Vision" (The Expert Eye)

To give the agent "sharp vision," the researchers built a massive new training library called VSR-Compare.

  • The Analogy: Imagine they hired 20,000 pairs of human eyes and super-computers to watch thousands of "Before vs. After" video clips. They asked: "Which one looks better? Is it the brighter one, or the sharper one?"
  • The Result: VQ-Jarvis learned from this massive library. Now, it can spot tiny differences that other AIs miss. It can tell the difference between a video that is "just okay" and one that is "perfectly crisp," just like a professional film editor can.

Superpower B: "Fast Thought" (The Smart Strategist)

Once VQ-Jarvis sees the problem, it needs to decide how to fix it. This is where its "Fast Thought" comes in. It uses a two-track strategy depending on how broken the video is:

  • Track 1: The "Cheat Sheet" (For Easy Jobs)

    • Scenario: The video is just a little dark or slightly blurry.
    • Action: VQ-Jarvis looks at its "Cheat Sheet" (a database of past successful repairs). It says, "Hey, I've seen this exact type of rain-and-darkness before. I know the perfect fix! Let's just apply that."
    • Benefit: It's instant. No thinking required.
  • Track 2: The "Chess Master" (For Hard Jobs)

    • Scenario: The video is a mess—dark, rainy, blurry, and low resolution all at once.
    • Action: The "Cheat Sheet" isn't enough. VQ-Jarvis switches to a step-by-step strategy. It tries a fix, checks if it got better, then tries the next tool. It's like a chess player thinking three moves ahead: "If I fix the rain first, then the blur, the colors will pop. But if I fix the blur first, the rain will look worse."
    • Benefit: It finds the perfect combination of tools without wasting time on dead ends.

3. The "Toolbox"

VQ-Jarvis doesn't invent new tools; it's the manager of a toolbox filled with the best video repair tools available (like specialized AI for removing rain, fixing low light, or sharpening images).

  • The Magic: It knows which tool to use and in what order.
    • Example: It knows you must remove the rain before you brighten the image, otherwise, the rain streaks will look like weird shadows. It figures this out automatically.

Why This Matters

In the real world, videos are rarely just "blurry." They are a messy mix of problems.

  • Old AI: Tries to fix everything with one big hammer. Often breaks things.
  • VQ-Jarvis: Looks at the mess, identifies the specific problems, grabs the right tools from its toolbox, and fixes them in the perfect order, all in the blink of an eye.

In short: VQ-Jarvis is the difference between a robot that blindly follows a recipe and a master chef who tastes the food, adjusts the spices, and knows exactly when to add the salt to make the dish perfect. It makes video restoration faster, smarter, and much higher quality.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →