Dual Randomized Smoothing: Beyond Global Noise Variance

Imagine you are a security guard at a museum, and your job is to protect a priceless painting (the AI model's decision) from vandals (adversarial attacks). The vandals try to make tiny, almost invisible changes to the painting to trick you into thinking it's something else.

Randomized Smoothing (RS) is a technique where, before you look at the painting, you put on a pair of foggy glasses. You look at the painting through the fog many times. If the painting looks like a "Cat" 90% of the time through the fog, you confidently say, "It's a Cat!" The thicker the fog (noise variance), the harder it is for a vandal to sneak a change past your eyes, but the harder it is for you to see the details of the painting clearly (accuracy).

The Old Problem: One Size Does Not Fit All

For years, security guards had to choose one single level of fog for the entire museum.

Thin Fog: You see details perfectly (high accuracy), but a clever vandal can easily slip a tiny sticker on the painting to change your mind (low robustness).
Thick Fog: You can't be tricked by stickers (high robustness), but the fog is so thick you can't tell if the cat is sleeping or playing (low accuracy).

The big problem? You can't have both. If you pick thin fog, you fail at large attacks. If you pick thick fog, you fail at small details. It's like trying to wear one pair of shoes that is perfect for running a marathon but also perfect for dancing ballet.

The New Solution: Dual Randomized Smoothing

The authors of this paper (Sun, Mao, and Vechev) came up with a brilliant new system called Dual Randomized Smoothing. Instead of wearing one pair of foggy glasses for everyone, they created a two-step process that adapts to each specific painting.

Think of it like a Smart Security Team:

Step 1: The Scout (The Variance Estimator)

First, you send out a quick scout to look at the painting. The scout doesn't decide what the painting is; they just answer one question: "How much fog does this specific painting need to be safe?"

If the painting is simple and easy to recognize, the scout says, "Hey, this is easy! We only need a light fog to see it clearly."
If the painting is complex or looks like it's being targeted, the scout says, "This one is tricky! We need heavy fog to be sure."

Crucially, the scout is also trained to be "locally consistent." This means if you move the painting just a tiny bit, the scout doesn't suddenly panic and change their mind about the fog level. They stay steady in their neighborhood.

Step 2: The Guard (The Classifier)

Once the scout gives the recommendation (e.g., "Use light fog"), the main guard puts on that specific level of fog and makes the final decision.

Because the guard used the perfect amount of fog for that specific painting, they get the best of both worlds: high accuracy for easy paintings and high security for hard ones.

Why is this a big deal?

No More Compromises: In the old system, you had to pick a "middle ground" fog that was okay for everyone but great for no one. With this new system, every painting gets its own custom-tailored security level.
The "Router" Idea: The paper also suggests a cool twist. Imagine you have a team of expert guards. One is amazing at spotting cats in low light, another is great at spotting dogs in bright sun. The "Scout" doesn't just pick the fog level; it acts as a traffic router, sending the painting to the specific expert guard best suited for that job.
Efficiency: You might think this two-step process is slow. It is, but only by about 60%. Compared to the massive gain in security and accuracy, that's a small price to pay.

The Results

When they tested this on famous image datasets (like CIFAR-10 and ImageNet), the results were impressive:

At small attack sizes (where old methods were already good), they improved accuracy by 15-20%.
At large attack sizes (where old methods usually fail completely), they still held strong.
They beat all previous "smart" methods that tried to adapt to inputs, proving that their "Scout + Guard" team is the most effective security detail yet.

In a Nutshell

The paper solves the "Goldilocks" problem of AI security. Instead of forcing every input to fit a single, rigid security standard, they built a system that measures the threat level of each input individually and applies the exact amount of protection needed. It's like having a security system that knows exactly how much armor to wear for every single battle, rather than wearing the same heavy suit for a handshake and a sword fight.

Here is a detailed technical summary of the paper "Dual Randomized Smoothing: Beyond Global Noise Variance" (ICLR 2026).

1. Problem Statement

Randomized Smoothing (RS) is a leading technique for certifying the robustness of neural networks against $\ell_2$ adversarial perturbations. It works by adding Gaussian noise to an input and taking a majority vote of the classifier's predictions.

The Fundamental Limitation: Standard RS relies on a global noise variance ( $\sigma$ $σ$ ) shared across all inputs. This creates an inherent accuracy-robustness trade-off:
- A small $\sigma$ yields high accuracy at small perturbation radii but fails to provide certified robustness at larger radii.
- A large $\sigma$ provides large certified radii but significantly degrades accuracy at small radii.
The Gap: No single global $\sigma$ can simultaneously achieve strong performance at both small and large radii.
Limitations of Prior Input-Dependent Approaches: Existing attempts to use input-dependent noise variances suffer from issues such as:
- Reliance on test-time memorization (storing optimal $\sigma$ for specific inputs), which prevents parallel inference.
- Intrinsic restrictions on adaptivity.
- Systematic over-estimation of the optimal variance, leading to suboptimal certified radii.

2. Methodology: Dual Randomized Smoothing (Dual RS)

The authors propose Dual RS, a framework that enables input-dependent noise variances while maintaining rigorous certification guarantees.

A. Theoretical Foundation: Locally Constant Variance

The core theoretical contribution is a generalization of the RS certification theorem.

Key Insight: RS certification remains valid even if the noise variance $\sigma(x)$ varies per input, provided that $\sigma(x)$ is locally constant within the certified neighborhood of the input.
Theorem 4.1 & 4.2: The authors prove that if $\sigma(x)$ is constant within a ball $B(x_0, R_\sigma)$ , the smoothed classifier is robust. They extend this to a probabilistic setting (Theorem 4.2), accounting for uncertainty in both the classification and the variance estimation.
Significance: This removes the requirement for a globally fixed $\sigma$ , allowing the model to adapt the noise level to the specific difficulty of each input.

B. The Dual RS Framework

The framework consists of two distinct components working in tandem:

Variance Estimator ( $g_e$ ): A model that predicts the optimal noise variance $\sigma_c(x)$ $σ_{c} (x)$ for a given input $x$ $x$ .
- To ensure the "locally constant" condition required by the theory, the variance estimator itself is smoothed using a global noise variance ( $\sigma_e$ ).
- This allows the system to certify that the predicted $\sigma_c(x)$ is stable within a local neighborhood.
Standard RS Classifier ( $g_c$ ): A classifier that performs the final prediction using the estimated $\sigma_c(x)$ $σ_{c} (x)$ .
- The input is perturbed with noise $\mathcal{N}(0, \sigma_c(x)^2 I)$ , denoised (using diffusion models), and classified.

Inference Process:

Estimation: Sample noise with global $\sigma_e$ to predict $\sigma_c(x)$ and certify its local constancy (radius $R_\sigma$ ).
Classification: Sample noise with the predicted $\sigma_c(x)$ to classify the input and certify the classification (radius $R_c$ ).
Final Guarantee: The final certified radius is $R_{final} = \min(R_\sigma, R_c)$ , with a total uncertainty budget $\alpha$ .

C. Training Strategies

To optimize the two components, the authors propose an iterative training procedure:

Variance Estimator Training:
- Soft Labels: Instead of hard labels (the single best $\sigma$ ), they use soft labels based on the exponential of the certified radius ( $e^{R_c}$ ). This allows the model to learn that a "near-optimal" $\sigma$ is acceptable if it yields a similar radius.
- Consistency Regularization: Encourages the variance estimator to produce consistent predictions under noise, improving the robustness of the estimation itself.
- Weighting: Uses class-balancing weights to handle skewed distributions of optimal variances.
Classifier Adaptation:
- The classifier is fine-tuned using the noise variances predicted by the trained variance estimator, rather than a fixed global $\sigma$ .
- This is typically done in one round after training the estimator from scratch.

D. Routing Perspective

The framework naturally supports a Mixture-of-Experts (MoE) approach. The variance estimator acts as a router, selecting the best pre-trained "expert" RS classifier (trained with a specific $\sigma$ ) for each input. This avoids the need to train a single base model that performs well across all noise levels.

3. Key Contributions

Theoretical Generalization: Proved that RS certification is valid with input-dependent variances as long as they are locally constant, relaxing the global variance constraint.
Dual RS Framework: Introduced a practical architecture with a variance estimator and a classifier, enabling flexible, input-adaptive noise levels without test-time memorization.
Efficient Training: Developed iterative training strategies (soft labels, consistency regularization) to jointly optimize the estimator and classifier.
Routing Mechanism: Demonstrated that the framework can serve as a router for off-the-shelf expert models, further improving the accuracy-robustness trade-off.

4. Experimental Results

Extensive experiments were conducted on CIFAR-10 and ImageNet.

Performance vs. Global Variance: Dual RS achieves strong performance across both small and large radii, a feat impossible with any single global $\sigma$ .
Comparison with State-of-the-Art (Multiscale):
- CIFAR-10: Outperforms the prior input-dependent method (Multiscale) by significant margins at radii 0.5, 0.75, and 1.0 (relative improvements of 15.6%, 20.0%, and 15.7% respectively).
- ImageNet: Shows consistent gains, with performance advantages of 8.6%, 17.1%, and 9.1% at radii 0.5, 1.0, and 1.5.
Computational Overhead:
- Dual RS incurs only a ~60% computational overhead compared to standard RS (e.g., 22.58s vs 14.07s per input on CIFAR-10).
- Unlike some prior methods, it has a fixed inference time regardless of the input, avoiding the worst-case latency spikes seen in multi-round certification methods.
Ablation Studies: Confirmed that the variance estimator can be trained with significantly reduced computational budgets (smaller $N$ or smaller datasets) with minimal performance loss.

5. Significance

Breaking the Trade-off: Dual RS fundamentally breaks the accuracy-robustness trade-off inherent in global noise variance methods, offering high accuracy at small radii while maintaining large certified radii.
Scalability: By avoiding test-time memorization and using efficient iterative training, the method scales to large datasets (ImageNet) and high-dimensional inputs.
Flexibility: The "routing" perspective opens new avenues for combining specialized expert models, allowing the system to leverage the best available robustness tools for specific input regions without retraining the entire system.
Practicality: The modest overhead makes it a viable drop-in replacement for standard RS in production environments requiring certified robustness.

In summary, Dual Randomized Smoothing provides a theoretically grounded, practically efficient, and highly effective solution to the long-standing limitation of global noise variance in certified adversarial robustness.