Applying Self-organizing Maps to the Inverse Problem

✨

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are a detective trying to solve a mystery in a crowded room. You know there are usually just regular people there (the "Standard Model" of physics), but suddenly, you spot a small group acting strangely. Your job is to figure out: Who are these strangers, and what is their secret?

This is the "Inverse Problem" in particle physics. Scientists see an unexpected blip in their data (the strange group) and need to work backward to identify exactly what new particle or theory caused it.

This paper presents two different detective tools to solve this mystery: a high-tech AI Brain (a Deep Neural Network) and a clever Organizing Map (Self-Organizing Maps, or SOMs).

Here is the breakdown of their investigation, explained simply.

The Setting: The Particle Physics Party

Imagine the Large Hadron Collider (LHC) is a massive party where particles crash into each other.

The Regulars (Background): Most of the time, you see standard particles like electrons and muons doing predictable things. This is the "noise."
The Suspects (Signal): The scientists are looking for "Vector-Like Leptons" (VLLs). Think of these as VIP guests who might be hiding in the crowd. They have different "weights" (masses), like 500, 1000, or 1500 units.
The Mystery: If the scientists see a bunch of extra particles, they need to know: Is it a 500-weight VIP? A 1000-weight VIP? Or just a coincidence of regular guests?

Tool 1: The AI Brain (Deep Neural Network or DNN)

The first method is like hiring a super-smart, trained detective.

How it works: You show the AI thousands of photos of the 500-weight VIPs, the 1000-weight VIPs, and the regular guests. You say, "Memorize what they look like."
The Test: When a new mystery group appears, the AI looks at them and says, "I'm 90% sure these are the 1000-weight VIPs."
The Flaw: The AI is very rigid. It only knows what you taught it. If the mystery group is actually a 2500-weight VIP (a type it never saw), the AI will squint and say, "Well, they look most like the 1500-weight VIPs I know," and guess wrong. It's like a child who only knows cats and dogs; if you show them a hamster, they might call it a "small dog."

Tool 2: The Organizing Map (Self-Organizing Maps or SOM)

The second method is the paper's "novel" approach. Imagine a giant, empty grid on the floor (like a chessboard).

How it works: Instead of memorizing specific faces, the SOM is a flexible organizer. You throw all the known VIPs (500, 1000, 1500) onto the grid.
- The 500s naturally clump together in the top-left corner.
- The 1000s clump in the middle.
- The 1500s gather in the bottom-right.
- The map organizes itself based on how similar the particles are.
The Magic Trick: The scientists did not show the SOM any regular guests (background). They only showed it the VIPs.
The Test: When a mystery group arrives, the SOM drops them onto the grid.
- If they land in the "1000 clump," it's a 1000-weight VIP.
- If they land in the "1500 clump," it's a 1500-weight VIP.
- Crucially: If they land in a weird spot between the clumps, or in a completely empty corner, the SOM doesn't force a wrong guess. It says, "These don't fit my known groups perfectly, but they are definitely not the regular guests."

The Four Mystery Cases

The authors tested both tools on four different scenarios:

The Perfect Match: A group of 1000-weight VIPs appears.
- Result: Both the AI Brain and the Map correctly identify them.
The Unknown Stranger: A group of 2500-weight VIPs (never seen before) appears.
- Result: The AI Brain guesses "1500" (closest match). The Map also guesses "1500" because that's the closest clump. Both fail to identify the new mass, but the Map gives a hint that something is "off" because the data is spread out.
The Mixed Crowd: A mix of regular guests and 500-weight VIPs.
- Result: The AI Brain gets confused by the regular guests. The Map is clever: it sees that some people land in the "Regular Guest" zone and ignores them, focusing only on the ones in the "VIP Zone." It correctly identifies the 500s.
The Unknown Mix: A mix of regular guests and 750-weight VIPs (a mass the tools weren't trained on).
- Result: The AI Brain gets confused. The Map filters out the regular guests and realizes the remaining strangers look a bit like the 500s and 1000s, but not exactly. It flags them for further investigation.

The Big Takeaway

The paper concludes that while the "AI Brain" (DNN) is slightly better at guessing when it has seen everything before, the "Organizing Map" (SOM) is a much more versatile tool for the real world.

Why?
In real life, scientists often don't know exactly what the "background noise" looks like, or they don't have enough data to train a complex AI. The SOM is like a smart, self-organizing filing cabinet. Even if you don't show it every possible file, it can still sort new papers into the right folders and highlight the ones that don't fit anywhere.

The Metaphor:

The DNN is like a Rigid Quiz Master: "I know 3 types of fruit. Is this an apple, a banana, or a grape?" If you show it a pear, it will force you to pick the closest one.
The SOM is like a Flexible Librarian: "I have shelves for apples, bananas, and grapes." If you hand it a pear, it might put it on the banana shelf because it's yellow, but it also leaves a note saying, "Hey, this doesn't quite fit the banana label, check it out!"

The authors suggest that using these "Organizing Maps" could be a game-changer for finding new physics, especially when the data is messy, scarce, or full of unknown surprises.

1. Problem Statement: The Inverse Problem in Particle Physics

The paper addresses the inverse problem in high-energy physics: given an experimental observation (specifically an excess of events over the Standard Model background), how can one uniquely identify the specific Beyond the Standard Model (BSM) theory or parameters responsible?

Context: While resonant searches (finding a specific particle mass peak) are straightforward, non-resonant searches involving cascade decays (e.g., vector-like leptons decaying into multiple leptons and bosons) make identifying the correct signal hypothesis difficult.
Specific Challenge: If a search in a trilepton ( $3\ell$ ) final state observes an excess, can one determine the mass of the hypothetical vector-like lepton (VLL) and distinguish it from Standard Model (SM) backgrounds or other mass hypotheses?
Constraints: The method must handle scenarios with low event counts (typical of counting experiments) and potentially unknown or data-driven backgrounds where labeled SM training data might be unavailable or insufficient.

2. Methodology

The authors propose and compare two machine learning approaches to solve this inverse problem using a search for Vector-Like Leptons (VLLs) in a trilepton final state.

A. Simulation and Data Setup

Physics Model: A doublet model of VLLs ( $L$ and $N$ ) with $\mu$ -flavor.
Signal Hypotheses: Five mass points ( $m_L = 500, 750, 1000, 1500, 2500$ GeV).
Backgrounds: SM processes $WZ$ and $t\bar{t}Z$ .
Kinematic Variables: Eight variables were constructed from the three leptons and missing transverse momentum ( $p_T^{miss}$ $p_{T}^{mi ss}$ ), including:
- Scalar sums of transverse momentum ( $L_T, H_T$ ).
- Invariant masses ( $m_{\ell\ell\ell}, m_{os}^{high/low}$ ).
- Transverse masses ( $m_T^{high}, m_T^{alllep}, p_T^{\ell j}$ ).
Training/Testing Split:
- Training: 9,500 events each for $m_L = 500, 1000, 1500$ GeV and SM processes.
- Testing: Independent datasets used to evaluate performance on specific "cases" (scenarios of observed excesses).

B. Approach 1: Multiclassifying Deep Neural Network (DNN)

Architecture: A supervised feed-forward network with three hidden layers (32, 16, 8 neurons) and a softmax output layer.
Classes: Four output neurons representing $m_L=500$ , $1000$, $1500$, and SM.
Training: Trained on all classes (Signal + SM) using categorical cross-entropy loss.
Inference Strategy: For a set of observed events, the median output score across the neurons determines the most likely hypothesis. A cut on the SM score ( $n_{SM} < 0.8$ ) is used to filter background in mixed cases.

C. Approach 2: Self-Organizing Maps (SOM)

Architecture: An unsupervised learning algorithm (using the MiniSOM library) arranged in a 2D grid ( $n \times n$ ).
Key Innovation (Supervised Usage of Unsupervised Tool):
- Training: The SOM is trained only on the VLL mass hypotheses ($500, 1000, 1500$ GeV). SM processes are explicitly excluded from training.
- Mechanism: The SOM learns the topology of the signal space. SM events, being distinct, naturally cluster in specific regions or map to neurons with low signal density.
Inference Strategy (Regional Separation Score):
1. For each observed event, find its Best Matching Unit (BMU) on the trained SOM.
2. Define a local neighborhood ( $m \times m$ ) around the BMU.
3. Calculate a Regional Separation Score ( $SepScore_{reg}$ ) for each hypothesis within that neighborhood. This score measures the dominance of one class over others in that local region.
4. Background Rejection: Events with a high $SepScore_{reg}^{SM}$ (indicating they fall in SM-dominated regions) are cut.
5. Hypothesis Identification: The hypothesis with the highest median regional score among surviving events is selected.

3. Key Contributions

Novel Application of SOMs: The paper demonstrates using SOMs in a "supervised" manner for the inverse problem, leveraging their inherent clustering capabilities without requiring labeled background data during training.
Background-Agnostic Training: By excluding SM processes from SOM training, the method is robust for scenarios where background modeling is difficult, purely data-driven, or where the background yield is too low for training.
Handling Unseen Mass Hypotheses: The framework provides a strategy to identify when an observed excess corresponds to a mass outside the trained range (e.g., $m_L = 2500$ GeV) by analyzing kinematic distributions of the surviving events against the training set.
Comparative Analysis: A rigorous comparison between a standard supervised DNN and the novel SOM approach across four distinct experimental scenarios.

4. Results

The authors tested both methods on four specific cases:

Case 1 (Clean Signal): 10 events of $m_L=1000$ $m_{L} = 1000$ GeV.
- Result: Both DNN and SOM correctly identified $m_L=1000$ GeV.
Case 2 (Unseen Mass): 10 events of $m_L=2500$ $m_{L} = 2500$ GeV (not in training).
- Result: Both methods erroneously identified the mass as $1500$ GeV (the closest trained hypothesis).
- Mitigation: The authors note that iterating the training with a different mass range or comparing the kinematic distribution ( $m_{\ell\ell\ell}$ ) of the surviving events to the training set can reveal the true mass is higher.
Case 3 (Mixed Signal/Background): 10 SM + 10 events of $m_L=500$ $m_{L} = 500$ GeV.
- Result: Both methods successfully filtered SM events (via $n_{SM}$ cut or $SepScore_{SM}$ cut) and correctly identified $m_L=500$ GeV.
Case 4 (Mixed Signal/Background, Unseen Mass): 5 SM + 10 events of $m_L=750$ $m_{L} = 750$ GeV.
- Result: Both methods struggled to pinpoint the exact mass (identifying it as 500 or 1000 GeV), but successfully isolated the non-SM events. Kinematic comparison suggested the mass was likely between the trained points.

Performance Metrics (AUC - Area Under Curve):

DNN: Achieved high AUCs ($0.947 - 0.977$).
SOM: Achieved competitive AUCs ($0.864 - 0.926$) despite not being trained on SM data.
Optimization: The best SOM performance was achieved with a $40 \times 40$ grid and a $3 \times 3$ or $5 \times 5$ neighborhood region.

5. Significance and Outlook

Versatility: SOMs offer a versatile tool for BSM searches, particularly when the background is instrumental or difficult to simulate. They can be trained on signal-only data and still effectively separate signal from unknown backgrounds.
Complementarity: While DNNs have a slight numerical edge in classification accuracy, SOMs provide unique interpretability through 2D clustering and do not require labeled background data.
Strategic Application: The authors suggest a hybrid strategy for LHC searches: use a DNN to suppress background, then apply a SOM trained on the surviving signal-like events to characterize any observed excesses, even if they are not statistically significant enough for a discovery claim.
Future Potential: The ability to probe kinematic properties of "unseen" mass hypotheses via SOM clustering makes this a valuable tool for model-independent searches.

In conclusion, the paper establishes that Self-Organizing Maps are a competitive and robust alternative to Deep Neural Networks for solving the inverse problem in particle physics, offering distinct advantages in scenarios with limited or unlabeled background data.