MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting

MoE-GS introduces a novel Mixture-of-Experts framework for dynamic Gaussian Splatting that utilizes a Volume-aware Pixel Router to adaptively blend heterogeneous deformation priors for superior novel view synthesis, while addressing efficiency concerns through multi-expert rendering optimizations and knowledge distillation.

In-Hwan Jin, Hyeongju Mun, Joonsoo Kim, Kugjin Yun, Kyeongbo Kong

Published 2026-03-10
📖 5 min read🧠 Deep dive

The Big Problem: One Tool Doesn't Fit All

Imagine you are trying to film a chaotic scene: a chef chopping vegetables, a steak sizzling on a grill, and a flame flickering wildly.

In the world of computer graphics, we use a technique called 3D Gaussian Splatting to recreate these scenes from video. Think of it like building a 3D model out of millions of tiny, fuzzy, colored balloons (Gaussians) that float in space. When you move your camera, the computer rearranges these balloons to show you a new angle.

However, when things move (like the chef or the fire), the computer gets confused. The paper points out a frustrating reality:

  • Expert A is great at the smooth, slow movement of the chef's arm but terrible at the chaotic, fast flickering of the fire.
  • Expert B is amazing at the fire but makes the chef's arm look like a blurry smear.
  • Expert C is good at the steak but fails at the vegetables.

No single "Expert" (algorithm) can handle every part of the scene perfectly. It's like trying to use a single pair of scissors to cut paper, thread, and metal wire. You might get the job done, but the results will be messy.

The Solution: The "All-Star Team" (Mixture of Experts)

The authors, In-Hwan Jin and his team, decided to stop relying on just one expert. Instead, they built a Mixture of Experts (MoE) system.

Imagine a high-end restaurant kitchen. Instead of one chef trying to do everything, you have a team:

  1. The Slicer: Specializes in smooth, precise cuts (good for the chef's arm).
  2. The Fire Starter: Specializes in wild, unpredictable flames (good for the grill).
  3. The Searer: Specializes in browning meat perfectly (good for the steak).

MoE-GS is the Head Chef (the Router). Its job isn't to cook the food; it's to decide who cooks what part of the dish.

How It Works: The Smart "Traffic Cop"

The magic of this paper lies in how the Head Chef makes decisions.

  1. The Volume-Aware Pixel Router:
    In older systems, the Head Chef might just look at the final picture and guess who should work. But that's like trying to fix a car engine by looking at the paint job.

    MoE-GS uses a Volume-Aware Pixel Router. Imagine this router as a super-smart traffic cop standing inside the 3D world, not just looking at the 2D photo. It sees the "fuzzy balloons" (Gaussians) floating in 3D space.

    • It sees a balloon near the fire and thinks, "Ah, this needs the Fire Starter!"
    • It sees a balloon near the chef's hand and thinks, "This needs the Slicer!"

    It then "splats" (projects) these decisions onto the final image. This ensures that the fire looks fiery and the hand looks smooth, all blended together seamlessly.

  2. The "One-Pass" Trick (Efficiency):
    Usually, if you have four experts, the computer has to render the scene four times (once for each expert) and then mix them. That's slow and heavy, like asking four painters to paint the same wall and then trying to blend their work.

    The authors invented a Single-Pass Multi-Expert Rendering trick. They put all the "fuzzy balloons" from all four experts into one giant bucket and paint the wall once. The computer figures out which balloon belongs to which expert instantly. This makes the process much faster.

  3. The "Pruning" Trick (Cleaning Up):
    Sometimes, an expert tries to paint a part of the scene where they are useless (like the Fire Starter trying to paint the chef's hand). This creates clutter.

    The system uses Gate-Aware Pruning. It's like a bouncer at a club. If a balloon (Gaussian) isn't contributing to the final image, the bouncer kicks it out. This keeps the scene clean and the computer running fast.

The "Teacher-Student" Strategy (Distillation)

There's one catch: Running a team of four experts is still heavier than running just one. To fix this for the future, the authors use a Knowledge Distillation strategy.

Think of the MoE-GS system as a Master Teacher. It has solved the problem perfectly by combining all the experts.

  • The Master Teacher then takes a Student (a single, lightweight expert model).
  • The Teacher says, "Look at this part of the image. I used the Fire Starter here. You try to learn how to do that part yourself."
  • The Student learns from the Teacher's "ghost" decisions.

Eventually, the Student becomes so good that it can recreate the high-quality result of the whole team, but it only takes up the space of a single expert. This means you get the high-quality video without needing a supercomputer to run it.

Why This Matters

  • Better Quality: It creates videos that look real, even when things are moving fast or chaotically.
  • Adaptability: It doesn't force one style of movement on the whole scene; it adapts to every tiny part of the video.
  • Future-Proof: By teaching the "Students" (distillation), this technology can eventually run on regular phones or laptops, not just massive servers.

In short: MoE-GS is like hiring a dream team of specialists, using a smart 3D traffic cop to assign the right work to the right person, and then teaching a single apprentice to do the whole job so well that you don't need the whole team anymore.