Decoupling Motion and Geometry in 4D Gaussian Splatting

This paper introduces VeGaS, a novel 4D Gaussian Splatting framework that decouples motion and geometry by employing a Galilean shearing matrix for time-varying velocity and a Geometric Deformation Network to achieve state-of-the-art high-fidelity dynamic scene reconstruction.

Yi Zhang, Yulei Kang, Jian-Fang Hu

Published 2026-03-03
📖 4 min read☕ Coffee break read

Imagine you are trying to film a chaotic scene: a dancer spinning, a flame flickering, and a steak sizzling on a grill. Your goal is to create a 3D movie that you can watch from any angle, at any moment in time.

For a long time, computer scientists have used a technique called Gaussian Splatting. Think of this like building a scene out of millions of tiny, fuzzy, 3D "clouds" (Gaussians). Each cloud has a position, a color, and a shape. By layering these clouds, you can create a photorealistic image.

However, when you try to make these clouds move (like a 4D movie), the old method (called 4DGS) had a major flaw. It treated the cloud's shape and its movement as if they were glued together in a single package.

The Problem: The "Glued" Package

Imagine you are trying to describe a runner.

  • The Old Way (4DGS): You say, "The runner is a cloud that is always shaped like a sphere, and it moves in a straight line at a constant speed."
    • The Issue: If the runner starts to twist, turn, or accelerate, the system gets confused. Because the shape and movement are glued together, trying to make the runner twist distorts their shape. The runner might suddenly look like a stretched-out blob or a jagged mess. This creates visual "glitches" or artifacts in the video.

The Solution: VeGaS (Velocity-based Decoupling)

The authors of this paper propose a new framework called VeGaS. Their big idea is to uncouple (unstick) the movement from the shape.

Think of it like a dance troupe:

  1. The Dancers (Geometry): These are the clouds. Their job is to keep their specific shape (a sphere, a cube, a weird blob) and just stand there or wiggle slightly.
  2. The Choreography (Motion): This is a separate script that tells the dancers where to go.

In VeGaS, they introduce two main tools to make this work:

1. The "Galilean Shearing" Matrix (The Flexible Choreography)

In physics, a "Galilean transformation" is a fancy way of describing how things move when you change your point of view. The authors use a mathematical trick called shearing.

  • The Analogy: Imagine a deck of cards. If you push the top of the deck to the right, the cards slide over each other, but the shape of each individual card doesn't change. They just shift position.
  • How it helps: VeGaS uses this to tell the clouds, "Move along this crazy, curvy path (non-linear motion) at varying speeds." Crucially, while the clouds slide along this path, their internal shape remains perfectly intact. This allows the system to handle complex movements like a spinning dancer or a flickering flame without the clouds getting distorted.

2. The "Geometric Deformation Network" (The Wiggle Room)

Sometimes, the object itself actually changes shape (like a muscle flexing or a flame changing form). The old system couldn't do this well because it was too busy trying to figure out the movement.

  • The Analogy: Now that the choreography is handled separately, the dancers have a special "wiggle network." This is a small AI brain that looks at the scene and says, "Okay, the flame is changing shape right now, so let's stretch this cloud slightly."
  • How it helps: This network refines the shape of the clouds independently of where they are moving. It ensures that if a steak is sizzling and changing shape, the system captures that detail without messing up the movement.

The Result: A Cleaner, Sharper Movie

By separating the "where" (motion) from the "what" (shape), VeGaS achieves two things:

  1. No More Glitches: The clouds don't get stretched into weird shapes just because they are moving fast or turning corners.
  2. Better Details: The system can capture fine details, like the individual flickers of a flame or the texture of a steak, much better than the previous methods.

Summary

If the old method was like trying to drive a car where the steering wheel and the engine were bolted together (making it hard to turn without stalling), VeGaS is like installing a modern transmission. It lets the engine (movement) and the steering (shape) work independently, resulting in a smooth, high-definition ride through time and space.

The paper proves this works by showing that VeGaS creates clearer, more realistic 4D videos of dancing, flames, and cooking steaks than any previous technology.