Wasserstein normalized autoencoder for anomaly detection

The Big Picture: Finding a Needle in a Haystack (Without Knowing What the Needle Looks Like)

Imagine you are a security guard at a massive airport. Every day, thousands of people walk through your checkpoint. You know exactly what a "normal" traveler looks like: they carry a backpack, wear a coat, maybe have a coffee. These are your Standard Model particles (the background).

But occasionally, someone walks through who is carrying something strange—maybe a glowing box or a suit made of invisible fabric. This is New Physics (the signal). The problem is, you don't know exactly what this "glowing box" looks like. It could be anything. If you try to teach your security system to spot a specific type of glowing box, you might miss a different kind.

So, you decide to teach your system only what "normal" looks like. If something doesn't fit the "normal" pattern, you flag it as an anomaly. This is called Anomaly Detection.

The Problem: The "Too Helpful" Robot

The paper discusses a specific type of AI called an Autoencoder. Think of an Autoencoder as a robot that tries to memorize a photo of a normal traveler, compress it into a tiny note, and then redraw the photo from that note.

The Goal: If the robot sees a normal traveler, it should redraw them perfectly (low error). If it sees a weird alien, it should struggle to redraw them (high error), and you flag the alien.
The Glitch: Sometimes, the robot is too good. If the alien is actually simpler than the normal travelers (maybe the alien is just a plain gray blob, while normal travelers have complex patterns), the robot might accidentally learn to redraw the alien perfectly, too.
The Result: The robot thinks the alien is normal because it can redraw it easily. The security system fails. In the paper, they call this "Outlier Reconstruction." It's like a forger who is so good at copying paintings that they accidentally forge a fake masterpiece so well that the museum thinks it's real.

The First Attempt: The "Normalized" Robot (NAE)

To fix this, the scientists tried a smarter robot called a Normalized Autoencoder (NAE).

Instead of just trying to redraw the picture, this robot tries to learn the probability of what a normal traveler looks like. It uses a mathematical trick involving a "Markov Chain" (think of it as a random walk) to generate fake "negative" examples. It asks itself: "If I make up a random traveler, does it look like the real ones I've seen?"

The Goal: It tries to make sure that anything that looks "weird" (low probability) gets a high "error score."
The New Glitch: This robot is unstable. Sometimes, it gets confused and starts "diverging." It might decide that the best way to win the game is to make everything look terrible to redraw, or it might collapse into a state where it redraws everything perfectly, including the weird aliens, just to minimize its own math score. It's like a student who, instead of studying, decides to cheat by memorizing the answer key in a way that breaks the test.

The Solution: The "Wasserstein" Robot (WNAE)

This is the main contribution of the paper. The scientists introduced the Wasserstein Normalized Autoencoder (WNAE).

To understand this, imagine you have two piles of sand:

Pile A: Real travelers (your training data).
Pile B: The robot's current guess of what travelers look like (its learned distribution).

In the old methods, the robot just tried to make the shapes of the piles match. But sometimes, the robot would cheat by making a pile that looked similar but was actually in the wrong place.

The Wasserstein distance is a way of measuring the "cost" to move the sand from Pile B to Pile A. Imagine you have to carry grains of sand from one pile to the other. The Wasserstein distance asks: "What is the minimum amount of effort (distance x weight) required to turn my fake pile into the real pile?"

How the WNAE works:

It doesn't just try to redraw the image; it tries to minimize the "effort" needed to make its fake data look exactly like the real data.
If the robot tries to cheat and redraw a weird alien perfectly, the "effort" (Wasserstein distance) to move that alien's data back to the "normal" pile becomes huge.
The robot is forced to stop cheating. It learns that the only way to minimize the effort is to strictly learn the shape of the "normal" pile and leave the "weird" stuff alone.

Why This Matters for the Paper

The scientists tested this on CMS, a giant particle detector at CERN (the Large Hadron Collider). They were looking for Semivisible Jets (SVJs).

The Scenario: Imagine a jet of particles (like a spray from a hose) that is half visible (standard particles) and half invisible (Dark Matter).
The Challenge: These jets look very similar to normal jets from top quarks (a common background). Standard robots failed to tell them apart because they kept "reconstructing" the weird jets as if they were normal.
The Result: The WNAE was able to learn the "normal" jet distribution perfectly without ever seeing a single "weird" jet during training. It successfully flagged the invisible-dark-matter jets as anomalies.

The Takeaway

The paper claims that by using the Wasserstein distance as the teacher, they built a robot that:

Doesn't cheat: It can't just learn to redraw weird things perfectly to lower its score.
Is stable: It doesn't crash or get confused like the previous "Normalized" version.
Is signal-agnostic: It doesn't need to know what the "weird" thing looks like. It just knows what "normal" looks like, and anything that doesn't fit that mold gets flagged.

In short, they fixed a broken security system by giving it a better way to measure how "far away" a suspicious person is from the crowd, ensuring that even the most cleverly disguised intruder gets caught.

Technical Summary: Wasserstein Normalized Autoencoder for Anomaly Detection

Problem Statement
Unsupervised machine learning, particularly Autoencoders (AEs), is a powerful tool for identifying new physics at the Large Hadron Collider (LHC) by separating Standard Model (SM) background events from potential Beyond-the-Standard-Model (BSM) signals without relying on specific signal hypotheses. However, standard AEs suffer from a critical failure mode known as "outlier reconstruction." In this scenario, the network learns to reconstruct anomalous data points (outliers) with low error, often because these outliers are less complex than the training data (a phenomenon termed "complexity bias") or simply because the network is free to minimize reconstruction error in regions of phase space outside the training distribution. This results in a loss of discrimination power, where the reconstruction error fails to distinguish between background and signal.

Previous attempts to address this using Normalized Autoencoders (NAEs), which frame the AE reconstruction error as an energy function within a Boltzmann distribution, have also faced challenges. NAE training often exhibits numerical instability, including the divergence of the loss function and "mode collapse," where the network learns a probability distribution that overlaps significantly with the signal, again leading to poor anomaly detection performance. Furthermore, existing NAE training lacks a robust, signal-agnostic stopping condition to prevent overtraining and outlier reconstruction.

Methodology
The authors introduce the Wasserstein Normalized Autoencoder (WNAE), a novel probabilistic model designed to overcome the limitations of both standard AEs and NAEs. The methodology proceeds as follows:

Probabilistic Framework: Like the NAE, the WNAE treats the AE reconstruction error $l_\theta(x)$ as an energy function $E_\theta(x)$ . The model defines a normalized probability distribution $p_\theta(x)$ using the Boltzmann distribution: $p_\theta(x) = \frac{1}{\Omega_\theta} \exp(-E_\theta(x))$ .
Markov Chain Monte Carlo (MCMC): To learn the distribution $p_\theta$ , the model employs a Langevin Monte Carlo algorithm to sample "negative" examples from $p_\theta$ . These samples are generated iteratively using the gradient of the energy function with respect to the input features.
The Wasserstein Distance Objective: The core innovation is the use of the 1-Wasserstein distance (Earth Mover's Distance) as the direct training objective. Instead of minimizing the negative log-likelihood (which involves an intractable partition function and leads to instability), the WNAE minimizes the Wasserstein distance $W(p_{data}, p_\theta)$ $W (p_{d a t a}, p_{θ})$ between the training data distribution $p_{data}$ $p_{d a t a}$ and the model distribution $p_\theta$ $p_{θ}$ .
- The loss function is defined as the Wasserstein distance between the positive samples (from $p_{data}$ ) and the negative samples (from $p_\theta$ ).
- This approach leverages the Kantorovich-Rubinstein duality, allowing for a stable, differentiable loss function that does not suffer from the vanishing gradients or mode collapse issues common in other generative models.
Training Dynamics: The training involves two phases: a coarse adjustment where the Wasserstein distance decreases sharply as the model adapts to the physical data, followed by a fine-tuning phase. A learning rate scheduler is employed to ensure stability. Crucially, the Wasserstein distance serves as a signal-agnostic stopping condition; training is halted when the distance begins to increase, indicating the onset of mode collapse or outlier reconstruction.

Case Study and Data
The algorithm is applied to the search for Semivisible Jets (SVJs), a signature of hidden valley models where dark sector particles produce jets containing both visible Standard Model particles and invisible dark matter states.

Background: Simulated top-antitop ( $t\bar{t}$ ) production with additional jets.
Signal: SVJ events generated via a bifundamental scalar mediator, with varying invisible fractions ( $r_{inv}$ ) and mediator masses ( $m_\Phi$ ).
Features: The input consists of 8 jet substructure variables (e.g., major/minor axes, energy flow polynomials, $N$ -subjettiness, softdrop mass) derived from particle-flow reconstruction.

Key Results

Failure of Standard AE: When trained on $t\bar{t}$ background, a standard AE fails to discriminate SVJs from background, yielding an Area Under the Curve (AUC) score close to 0.5 (random guessing) due to outlier reconstruction.
NAE Instability: While the NAE initially improves discrimination, it suffers from loss divergence and mode collapse. The AUC degrades over time as the negative energy diverges, and the model fails to distinguish signal from background without a signal-dependent stopping condition.
WNAE Performance: The WNAE demonstrates stable, convergent training.
- It achieves strong classification performance across a wide range of SVJ signal hypotheses, with AUC scores significantly higher than the standard AE and comparable to or better than the NAE at its optimal point.
- The Wasserstein distance effectively correlates with the AUC score, providing a reliable stopping condition that prevents the model from learning the signal distribution.
- The WNAE mitigates complexity bias. Unlike standard AEs, which struggle when the signal is less complex than the background, the WNAE successfully identifies top quark jets as anomalies even when trained on SVJ signals, demonstrating its ability to learn the true probability density of the training data rather than just minimizing reconstruction error.

Significance and Claims
The paper claims that the WNAE directly addresses the fundamental failure mode of outlier reconstruction in autoencoder-based anomaly detection. By minimizing the Wasserstein distance between the training data distribution and the model's learned distribution, the algorithm ensures that regions of phase space distinct from the training data are assigned high reconstruction errors.

The authors emphasize that the WNAE remains fully unsupervised and signal-agnostic. It does not require knowledge of the signal hypothesis during training, nor does it rely on ad-hoc regularization to stabilize the NAE loss. The method provides a robust, stable, and effective tool for anomaly detection in high-energy physics, capable of identifying new physics signatures like semivisible jets against complex Standard Model backgrounds. The paper concludes that while the WNAE is stable for the studied task, it may still be subject to generic limitations of anomaly detection models, such as the overlap of signal and background distributions or contamination of training data with anomalies, though it offers a pathway for self-supervised refinement in such cases.