Trigger Optimization and Event Classification for Dark… — Plain-Language Explanation

Original authors: F. D. Amaro, R. Antonietti, E. Baracchini, L. Benussi, C. Capoccia, M. Caponero, L. G. M. de Carvalho, G. Cavoto, I. A. Costa, A. Croce, M. D'Astolfo, G. D'Imperio, G. Dho, E. Di Marco, J. M. F. dos S

Published 2026-03-24

📖 4 min read☕ Coffee break read

View on arXiv ↗PDF ↗

✨

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to find a single, tiny firefly blinking in a massive, pitch-black stadium filled with millions of other lights. Some of those lights are just random flickers from the stadium's wiring (noise), some are from people waving flashlights (background radiation), and you are desperately looking for that one specific firefly that might be a Dark Matter particle.

This is the challenge facing the CYGNO experiment, a high-tech detector designed to hunt for Dark Matter. The detector is like a giant, ultra-sensitive camera that takes huge, high-definition photos of gas inside a tank. But there's a problem: these photos are so big and full of "static" (noise) that storing and analyzing every single pixel would overwhelm the computer, like trying to read every word in a library just to find one sentence.

The paper you shared describes how the scientists are using Machine Learning (AI) to solve two major headaches: finding the signal quickly and telling the difference between a firefly and a flashlight.

Here is the breakdown of their two clever AI strategies, explained simply:

1. The "Noise-Canceling" AI (Unsupervised Anomaly Detection)

The Problem: The camera takes massive pictures, but the actual Dark Matter signal is tiny. Most of the picture is just empty space or static noise. If they save the whole picture, they waste storage and time.

The Solution: They trained an AI called an Autoencoder to be a "master of the boring stuff."

How it works: Imagine teaching a student to draw a picture of a blank, static-filled wall (pedestal data) over and over again until they can recreate it perfectly.
The Trick: When you show this student a new picture that has a firefly (a particle) in it, they try to draw the blank wall again. Because they only know how to draw the wall, they fail to draw the firefly. The "mistake" they make—the part of the picture they couldn't recreate—is the signal!
The Result: The AI instantly ignores 97.8% of the image (the boring wall) and highlights just the tiny 2.2% where the firefly is. It does this incredibly fast (in 25 milliseconds, faster than a human blink), allowing the experiment to throw away useless data in real-time and only save the interesting parts.

2. The "Mix-and-Match" Detective (Weakly Supervised Learning)

The Problem: To teach an AI to recognize a Dark Matter particle (a "Nuclear Recoil"), you usually need to show it thousands of labeled examples: "This is a firefly," "This is a flashlight." But in real life, you don't have a box of pure Dark Matter particles to show the AI. You only have a messy mix of everything.

The Solution: They used a method called CWoLa (Classification Without Labels).

The Analogy: Imagine you have two buckets of soup.
- Bucket A (Standard): Just vegetable broth (background noise).
- Bucket B (AmBe Source): Vegetable broth mixed with a specific spice (neutrons that create Dark Matter-like signals).
- You don't know which specific spoonful in Bucket B has the spice, but you know Bucket B as a whole has more spice than Bucket A.
How it works: The AI is told, "Here is a spoonful from Bucket A, and here is a spoonful from Bucket B. Figure out which one is from Bucket B."
The Magic: To get the answer right, the AI must learn to identify the unique taste (shape) of the spice. Once it learns to distinguish the "spicy" spoonfuls from the "bland" ones, it has effectively learned what the Dark Matter signal looks like, even though it was never explicitly told, "This specific pixel is a firefly."
The Result: The AI successfully isolated the "spicy" events. These events looked like compact, circular blobs (which is exactly what a Dark Matter particle should look like), proving the method works without needing perfect labels.

Why This Matters

The CYGNO experiment is building the next generation of Dark Matter detectors. These detectors will be so sensitive they will produce terabytes of data. Without these AI tricks, the computers would drown in data, and the real signals would be lost in the noise.

Strategy 1 acts like a smart filter, instantly throwing away 98% of the junk so the computer only has to think about the important stuff.
Strategy 2 acts like a detective, learning to spot the "needle in the haystack" just by comparing two slightly different piles of hay.

Together, these tools pave the way for a future where we can scan the universe for Dark Matter in real-time, turning a massive, impossible data problem into a manageable, solvable puzzle.

1. Problem Statement

The CYGNO experiment utilizes an optical-readout Time Projection Chamber (TPC) filled with a Helium-CF4 gas mixture to search for Dark Matter. The detector captures scintillation light from ionization electrons using scientific CMOS cameras, producing high-resolution, megapixel-scale images.

The primary challenges addressed in this work are:

Data Volume and Sparsity: The detector generates massive data streams (O(10²) MB/s for the CYGNO-04 demonstrator), yet the physical signals (nuclear recoils) occupy only a tiny fraction of the pixels. Storing full frames is inefficient.
Latency: Traditional offline reconstruction pipelines take seconds per frame, which is too slow for real-time triggering and data reduction.
Label Scarcity: Identifying Dark Matter candidates (Nuclear Recoils, NR) against backgrounds (Electronic Recoils, ER) typically requires labeled data, which is often unavailable or difficult to obtain for rare events.
Noise Morphology: The images are dominated by structured noise (readout noise, fixed patterns, dark counts), making signal extraction difficult without sophisticated filtering.

2. Methodology

The authors propose two complementary Machine Learning (ML) strategies that require minimal supervision to address data reduction and event classification.

A. Unsupervised Anomaly Detection for ROI Extraction

Concept: A reconstruction-based anomaly detection approach using a Convolutional Autoencoder (AE).
Training Data: The AE is trained exclusively on pedestal frames (images acquired with Gas Electron Multiplier amplification disabled). This ensures the model learns only the detector's noise morphology without any particle signal.
Architecture: A convolutional AE with strided-convolution down-sampling, transposed-convolution up-sampling, skip connections, and a 128-dimensional latent space.
Objective Function: A hybrid loss function combining Mean Square Error (MSE) and Structural Similarity (SSIM) to preserve noise structure while failing to reconstruct particle-induced anomalies.
Inference Process:
1. The AE attempts to reconstruct standard frames (with signals).
2. An anomaly map is generated as the pixel-wise residual: $r(x) = |x - \hat{x}|$ .
3. A global threshold ( $\tau \approx 0.04$ ) is applied to the residual map.
4. Morphological closing connects residual fragments into tracks.
5. The resulting binary mask defines a compact Region of Interest (ROI), discarding the vast majority of the image area.

B. Weakly Supervised Classification (CWoLa)

Concept: Application of the Classification Without Labels (CWoLa) framework to distinguish Nuclear Recoils (NR) from Electronic Recoils (ER) without event-level labels.
Data Source: A mixture of data from an Americium–Beryllium (AmBe) neutron source (signal-enriched) and standard background data (source absent).
Theoretical Limit: The AmBe dataset contains a known fraction of signal ( $\alpha \approx 32\%$ ) mixed with background indistinguishable from the standard dataset. This sets a theoretical ceiling for the Area Under the Curve (AUC) at $AUC_{max} = 0.5 + \alpha/2 \approx 0.66$ .
Model: A compact Convolutional Neural Network (CNN) trained on 128×128 grayscale images using binary cross-entropy loss and the Adam optimizer.
Training Strategy: The classifier is trained to distinguish between the "AmBe mixture" and the "Standard mixture." Asymptotically, this learns the optimal $S$ vs. $B$ discriminator.

3. Key Contributions

Pedestal-Trained Autoencoder: Demonstrated a fully unsupervised method to learn detector noise and isolate particle signals, enabling fast data reduction without needing labeled signal data.
CWoLa Implementation: Successfully applied weakly supervised learning to real experimental data with mixed samples, proving that NR-like topologies can be learned directly from data without explicit event labeling.
Real-Time Feasibility: Validated that these ML approaches can operate with low latency on consumer-grade hardware, making them viable for online trigger systems in next-generation TPCs.

4. Results

ROI Extraction Performance

Signal Retention: The system retains 93.0 ± 0.2% of the reconstructed signal intensity.
Data Reduction: It discards 97.8 ± 0.1% of the total image area, drastically reducing storage and bandwidth requirements.
Latency: Inference time is approximately 25 ms per frame on a consumer GPU, suitable for real-time processing.
Stability: The performance lies on a broad plateau of the coverage–compression trade-off, indicating robustness against threshold variations.

CWoLa Classification Performance

AUC Performance: The classifier achieved performance approaching the theoretical ceiling of 0.660 ± 0.005, indicating near-optimal separation power given the mixture composition.
Event Selection: By applying a conservative threshold ( $p > 0.8$ ), the system isolates a high-score population.
Morphological Consistency: Selected events exhibit compact, approximately circular morphologies with high reconstructed energy density, consistent with the expected topology of Nuclear Recoils.

5. Significance

Scalability for CYGNO: These methods provide a realistic baseline for the upcoming CYGNO-04 demonstrator, solving the critical bottleneck of data throughput by enabling online ROI extraction and efficient background rejection.
Generalizability: The unsupervised anomaly detection strategy is applicable to any sparse-signal imaging detector where noise can be characterized independently.
Weakly Supervised Learning: The success of CWoLa in this context demonstrates a powerful pathway for Dark Matter searches where labeled signal data is scarce, allowing experiments to learn signal characteristics directly from mixed calibration data.
Future Impact: These strategies support the transition to scalable, ML-assisted online selection systems, potentially applicable to other rare-event searches beyond Dark Matter.

Trigger Optimization and Event Classification for Dark Matter Searches in the CYGNO Experiment Using Machine Learning