Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian Splatting

Imagine you are trying to paint a beautiful landscape, but you only have three blurry photos to guide you. This is the challenge of "Sparse-View 3D Gaussian Splatting." The computer tries to build a 3D world from very few pictures.

The problem? The computer gets too confident. It starts "memorizing" the few photos it has instead of learning the actual shape of the world. This is called overfitting. It's like a student who memorizes the answers to three practice tests but fails the real exam because they didn't understand the concepts.

To fix this, previous methods tried a technique called Dropout. Imagine the computer is a choir of thousands of singers (called "Gaussians"). To stop them from memorizing, the conductor randomly tells some singers to go silent. The idea is that the remaining singers must work harder to fill the gaps, learning the song better.

But here's the catch: In this 3D world, the singers stand right next to each other and sing the exact same note. If you silence one, the neighbor immediately sings louder to cover the gap. The silence is never really felt, so the singers don't learn anything new. They just keep memorizing.

The paper "DropAnSH-GS" introduces a smarter way to silence the choir. Here is how they did it, using simple analogies:

1. The "Anchor and Neighborhood" Strategy (Dropping Anchors)

Instead of silencing one random singer, the new method picks a "Leader" (an Anchor) and silences the Leader plus their entire neighborhood.

The Analogy: Imagine a crowded room where everyone is whispering the same secret. If you tell one person to stop talking, the person next to them just whispers it louder. But if you tell a whole group of friends standing in a circle to stop talking, you create a big silence.
The Result: The remaining singers (Gaussians) can't just lean on their neighbors to fill the gap. They are forced to listen to people far away and learn the whole song structure, not just the local whisper. This forces the computer to build a more robust, accurate 3D model.

2. The "Color Palette" Strategy (Dropping Spherical Harmonics)

The 3D models also use "Spherical Harmonics" (SH) to describe colors. Think of SH as a set of paintbrushes:

Low-degree brushes: Big, broad strokes for basic colors (e.g., "sky is blue").
High-degree brushes: Tiny, fine-detail brushes for complex patterns (e.g., "the exact texture of a leaf").

In sparse-view training, the computer gets obsessed with the tiny detail brushes. It tries to memorize every tiny speck of dust in the few photos, which ruins the overall picture.

The Solution: The new method randomly throws away the fine-detail brushes during training.
The Result: The computer is forced to learn the scene using only the big, broad strokes first. It learns the "skeleton" of the color before worrying about the "flesh."
The Bonus: Because the computer learned to rely on the big strokes, you can later throw away the fine-detail brushes permanently without losing much quality. This makes the final 3D model much smaller and faster to load, like compressing a high-res photo into a smaller file without it looking blurry.

Why This Matters

Better Quality: The 3D scenes look sharper and have fewer weird artifacts (glitches) when viewed from new angles.
Smaller Files: You can shrink the file size significantly (sometimes by 75%!) just by cutting off the "fine detail" math after training, and the image still looks great.
Fast & Easy: It doesn't slow down the computer much; it just changes how the computer learns.

In summary: The old way was like telling one person in a crowd to be quiet (and they got covered up). The new way is like clearing out a whole block of the city, forcing the remaining people to communicate across the whole town. Plus, it teaches the computer to focus on the "big picture" colors first, making the final result both higher quality and smaller in size.

1. Problem Statement

Context: 3D Gaussian Splatting (3DGS) has become a leading method for Novel View Synthesis (NVS) due to its balance of rendering speed and visual fidelity. However, it suffers from severe overfitting when trained with sparse-view inputs (few camera angles), leading to artifacts, blurring, and geometric distortions.

Limitations of Existing Solutions:
Recent approaches attempt to mitigate overfitting using Dropout mechanisms (randomly setting Gaussian opacities to zero during training). The authors identify two critical flaws in these existing methods (e.g., DropGaussian, DropoutGS):

Neighbor Compensation Effect: 3DGS relies on overlapping Gaussians with high spatial redundancy. When a single Gaussian is dropped, its neighbors (which have similar opacity and color) easily compensate for the missing contribution. This results in negligible changes to the rendered image and weak gradient signals, failing to enforce effective regularization.
Neglect of Attribute Overfitting: Existing methods only drop Gaussians based on opacity. They ignore the role of Spherical Harmonic (SH) coefficients. The authors demonstrate that high-degree SH coefficients are a significant source of overfitting in sparse-view scenarios, causing the model to memorize fine color details that do not generalize.

2. Methodology: DropAnSH-GS

The proposed method introduces a structured regularization strategy comprising two main components:

A. Anchor-based Spatial Dropout

Instead of dropping isolated Gaussians, DropAnSH-GS creates "information voids" by removing clusters of Gaussians.

Anchor Selection: Randomly select a subset of Gaussians as anchors based on a sampling ratio ( $p_a$ ).
Neighborhood Construction: For each anchor, identify its $k$ nearest neighbors in Euclidean space.
Structured Removal: Simultaneously drop the anchor and all its neighbors.
Mechanism: This eliminates contiguous 3D regions rather than isolated points. It forces the remaining Gaussians to rely on long-range contextual information and global scene structure to reconstruct the missing areas, effectively breaking local redundancy and preventing neighbor compensation.

B. Spherical Harmonics (SH) Dropout

The method extends dropout to the appearance attributes (color) encoded by SH coefficients.

Degree-based Dropping: During training, for a subset of Gaussians, all SH coefficients above a certain degree ( $l_{max}$ ) are set to zero.
Curriculum Learning: $l_{max}$ starts low (e.g., 0) and gradually increases during training. This forces the model to learn robust, coarse appearance representations first before refining with high-frequency details.
Post-Training Compression: Because the model learns to prioritize low-degree SH, high-degree coefficients can be truncated after training without retraining, significantly reducing model size while maintaining quality.

3. Key Contributions

Identification of Regularization Failure: The paper analytically proves that independent Gaussian dropout fails due to spatial redundancy and neighbor compensation, a phenomenon quantified using Moran's I metric.
DropAnSH-GS Framework: A novel two-pronged strategy:
- Spatial: Anchor-based dropout that removes clusters of Gaussians to disrupt local coherence.
- Attribute: SH-based dropout that regularizes color attributes and enables flexible model compression.
State-of-the-Art Performance: The method significantly outperforms existing sparse-view 3DGS techniques and NeRF-based baselines.
Modularity: The approach is lightweight, adds negligible computational overhead, and can be integrated into various 3DGS variants (e.g., FSGS, CoR-GS, Scaffold-GS) to boost their performance.

4. Experimental Results

The method was evaluated on standard datasets: LLFF (real-world), Mip-NeRF 360 (unbounded real-world), and Blender (synthetic).

Quantitative Performance:
- On the LLFF dataset (3 views), DropAnSH-GS achieved a PSNR of 20.68, outperforming the previous best (DropGaussian at 20.33) and standard 3DGS (19.17).
- It consistently achieved higher PSNR, SSIM, and lower LPIPS across 3, 6, and 9 view settings compared to NeRF-based methods (RegNeRF, FreeNeRF) and other 3DGS variants.
Model Compression:
- By truncating high-degree SH coefficients post-training, the model size was reduced drastically. For example, on the Blender dataset, retaining only 0-degree SH reduced the model size from 6.5 MB to 1.7 MB while maintaining a PSNR of 25.04 (higher than the full 3DGS baseline of 22.13).
Efficiency:
- Training time increased by less than 2.8% compared to vanilla 3DGS, despite the added cost of nearest-neighbor searches (optimized via CUDA).
Ablation Studies:
- Removing the "Drop Anchor" strategy caused a significant performance drop, confirming the necessity of cluster-based removal.
- "Drop SH by Degree" outperformed random coefficient dropping, proving that structured reduction of frequency is more effective.

5. Significance

This work addresses a fundamental bottleneck in 3DGS: its instability with sparse data. By shifting the paradigm from "randomly killing individual Gaussians" to "removing spatial clusters" and "regularizing frequency components," the authors provide a robust solution that:

Enhances Generalization: Forces the model to learn global scene geometry rather than local artifacts.
Enables Compression: Offers a built-in mechanism for creating lightweight models suitable for edge devices without retraining.
Universal Applicability: Can be applied as a plugin to almost any existing 3DGS variant, making it a valuable tool for the broader community in sparse-view reconstruction tasks.

Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian Splatting

1. The "Anchor and Neighborhood" Strategy (Dropping Anchors)

2. The "Color Palette" Strategy (Dropping Spherical Harmonics)

Why This Matters

1. Problem Statement

2. Methodology: DropAnSH-GS

A. Anchor-based Spatial Dropout

B. Spherical Harmonics (SH) Dropout

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Conversational Successes and Breakdowns in Everyday Smart Glasses Use

EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents

GVGS: Gaussian Visibility-Aware Multi-View Geometry for Accurate Surface Reconstruction

PyEncode: An Open-Source Library for Structured Quantum State Preparation

DOne: Decoupling Structure and Rendering for High-Fidelity Design-to-Code Generation