Original authors: Lucie Flek, Oliver Janik, Philipp Alexander Jung, Akbar Karimi, Timo Saala, Alexander Schmidt, Matthias Schott, Philipp Soldin, Matthias Thiesmeyer, Christopher Wiebusch, Ulrich Willemsen

Published 2026-06-17

📖 4 min read🧠 Deep dive

CC BY 4.0

Original authors: Lucie Flek, Oliver Janik, Philipp Alexander Jung, Akbar Karimi, Timo Saala, Alexander Schmidt, Matthias Schott, Philipp Soldin, Matthias Thiesmeyer, Christopher Wiebusch, Ulrich Willemsen

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you have a very smart robot that looks at pictures and guesses what they are. Maybe it's looking at a photo of a cat and says, "That's a dog!" or looking at a blurry photo of a star and says, "That's a planet!"

Scientists in the world of physics (where they study tiny particles and huge stars) use these robots to make sense of massive amounts of data. But there's a problem: these robots can be easily tricked. If you change a picture just a tiny, invisible amount, the robot might suddenly change its mind completely.

This paper introduces a new tool called MiniFool. Think of MiniFool not as a "hacker" trying to break the robot, but as a stress-test inspector. Its job is to ask: "How much do I actually have to wiggle this data before the robot changes its mind?"

Here is how it works, using simple analogies:

1. The "Fake" vs. The "Real" Trick

Most old ways of tricking robots (called "adversarial attacks") are like a magician pulling a rabbit out of a hat. They change the data in ways that are mathematically small but physically impossible.

The Old Way: Imagine trying to trick a robot by changing a pixel in a photo to a negative number. In the real world, you can't have "negative light." But old tricks didn't care; they just wanted the robot to get confused.
The MiniFool Way: MiniFool is like a strict physics teacher. It says, "You can only change the data if the change makes sense in the real world." If a sensor has a known margin of error (like a ruler that is slightly fuzzy), MiniFool only changes the data within that fuzzy range. It asks: "Can I trick the robot using only the natural 'fuzziness' of the measurement?"

2. The "Wiggle Room" Test

The researchers use a special knob called the "Attack Parameter." Think of this knob as a dial that controls how much "wiggle room" or uncertainty we allow in the data.

Turn the dial low (Low Wiggle Room): If the robot changes its mind with just a tiny, almost invisible nudge, it means the robot is fragile. It's like a house of cards; a small breeze knocks it over.
Turn the dial high (High Wiggle Room): If the robot only changes its mind when you shake the data violently (way more than the natural error of the instrument), it means the robot is robust. It's like a brick wall; it takes a lot to move it.

3. Three Real-World Tests

The paper tested MiniFool on three different things to show it works everywhere:

The Handwritten Digits (MNIST): They showed the robot pictures of numbers (like a "9").
- Result: When the robot was right, it was hard to trick. When the robot was already wrong (thinking a "9" was an "8"), it was very easy to trick it back to the right answer with a tiny nudge. This proved MiniFool can spot which guesses are shaky.
The Ice Cube Telescope (IceCube): This is a giant detector in Antarctica that looks for ghostly particles called neutrinos. They wanted to find a specific type called a "tau neutrino."
- Result: They used MiniFool on real data from the telescope. They found that the "good" events (real tau neutrinos) were very hard to trick, while the "bad" events (background noise) were easy to trick. This helped them verify that their discovery was real and not just a fluke.
The Particle Collider (CMS): This is a giant machine that smashes particles together to find heavy "b-quarks."
- Result: They tested the robot that identifies these particles. They found that if the robot was confident and correct, it took a huge "nudge" to change its mind. If it was wrong, a tiny nudge fixed it.

The Big Takeaway

The main point of this paper is that MiniFool helps scientists trust their robots.

By using this tool, scientists can look at a specific piece of data and say: "Is this classification strong, or is it just a lucky guess that would fall apart if the measurement was slightly off?"

It doesn't just tell you if the robot can be tricked; it tells you how much it takes to trick it, based on the real-world rules of physics. This helps scientists separate the solid, reliable discoveries from the shaky ones.

Technical Summary: MiniFool - Physics-Constraint-Aware Minimizer-Based Adversarial Attacks

Problem Statement

In particle and astroparticle physics, deep neural networks (DNNs) are increasingly central to data analysis, particularly for classifying events into signal and background categories. However, these networks are trained on labeled Monte Carlo simulations, which may contain subtle mismodeling of experimental instruments. Such mismodeling can cause networks to learn non-physical features, degrading performance when applied to real experimental data.

Standard adversarial attack algorithms (e.g., FGSM, PGD, DeepFool) are designed to find minimal perturbations that flip a classification. However, these methods typically minimize distance metrics (like $L_\infty$ or $L_2$ ) without regard for experimental uncertainties or physical boundary conditions (e.g., conservation laws, non-negative signals). Consequently, the perturbations required to fool a network in these standard approaches often represent physically impossible data events. This makes the statistical and physical interpretation of attack success rates difficult or impossible in the context of experimental physics. There is a need for an attack methodology that respects experimental uncertainties to robustly assess network decisions and identify misclassified events.

Methodology: The MiniFool Algorithm

The authors present MiniFool, a new algorithm that implements physics-inspired adversarial attacks. Unlike traditional methods that seek the absolute minimal change in pixel or feature values, MiniFool minimizes a cost function that balances the change in the target classification score against the physical plausibility of the perturbation, quantified by experimental uncertainties.

Core Formulation

The algorithm operates on a classification task with $m$ classes. Given an input $\vec{x}_0$ and a target score $g$ (typically $g=0$ to flip a confident classification), MiniFool seeks an adversarial input $\vec{x}_a$ that minimizes the total metric $\lambda$ :

$\lambda(\vec{x}_0; \vec{x}_a; \vec{\theta}) = \eta(\vec{x}_0; \vec{x}_a) + \beta \cdot (f_{i^*}(\vec{x}_a; \vec{\theta}) - g)^2$

Where:

$\eta$ (Input Metric): A squared $L_2$ norm of the mean-squared deviations of the perturbed features relative to the original features, normalized by the experimental uncertainty $\sigma_i$ of each feature:
$\eta = \frac{1}{N} \sum_{i=1}^{N} \left( \frac{x_{0,i} - x_{a,i}}{\sigma_i} \right)^2$
This metric is analogous to a $\chi^2$ statistic. It can be extended to correlated features using a covariance matrix.
Target Score Term: The squared difference between the network's output for the target class $f_{i^*}$ and the desired score $g$ .
$\beta$ : A tunable meta-parameter (default 1) to reweight the terms, useful if outputs are not normalized probabilities.
Attack Parameter ( $s$ ): To assess robustness, the nominal uncertainties $\vec{\sigma}_0$ are scaled by a scalar $s$ ( $\vec{\sigma} = s \cdot \vec{\sigma}_0$ ). By scanning $s$ , one can determine the magnitude of uncertainty required to flip a classification.

Optimization

The minimization is performed using the Adam optimizer. The process is computationally more demanding than gradient-sign methods, taking seconds per event on a workstation, but it yields solutions that are statistically consistent with the assumed experimental uncertainties. The algorithm is implemented in both TensorFlow and PyTorch.

Key Contributions and Results

The paper validates MiniFool across three distinct domains:

1. MNIST Digit Classification (Warm-up)

Setup: A standard feedforward network trained on 28x28 grayscale images (98%+ accuracy).
Results:
- MiniFool successfully flipped classifications (e.g., "9" to "4") and corrected misclassifications (e.g., "9" misclassified as "8" corrected to "9") with minimal perturbations.
- Robustness Analysis: When scanning the attack parameter $s$ , misclassified images showed a rapid decay in confidence scores, while correctly classified images remained robust until much larger $s$ values. This confirms that incorrect predictions are less robust to perturbations consistent with data uncertainties.

2. IceCube Tau Neutrino Identification

Context: Identifying astrophysical tau neutrinos ( $\nu_\tau$ ) against background electron neutrinos ( $\nu_e$ ) using 2D images of Cherenkov light patterns in the IceCube detector.
Setup: Attacks were applied to simulated events and seven real candidate events identified in a 2024 analysis. Uncertainties were modeled as 10% of the recorded amplitude.
Results:
- For the seven real candidate events, six required an attack parameter $s \approx 10$ (implying 100% uncertainty) to flip the classification, indicating high robustness.
- One event flipped at $s < 1$ , suggesting it was likely a misclassified background event.
- This distinction aligns with the background estimation of 0.5 events in the original analysis, demonstrating that MiniFool can independently verify the purity of selected event samples.

3. CMS B-Jet Tagging

Context: Identifying jets originating from b-quarks using the DeepJet network on CMS Open Data.
Setup: The network processes over 600 input quantities. Due to the difficulty in quantifying specific uncertainties for all inputs, a simplified model was used where all uncertainties were set to unity and scaled by $s$ .
Results:
- Stress Test: The Area Under the Curve (AUC) degraded from 0.932 (nominal) to 0.752 as $s$ increased. Significant degradation began at $s = 3 \times 10^{-4}$ .
- Robustness Scan: For a subset of 200 jets with high initial confidence ( $P \ge 87\%$ ), correctly classified jets required significantly larger $s$ values to flip compared to initially misclassified jets.
- This confirms the algorithm's generalizability to high-dimensional, heterogeneous particle physics data.

Significance and Claims

The paper claims that MiniFool provides a physically meaningful way to test the robustness of neural network decisions in particle physics. Its primary significance lies in:

Physical Plausibility: By incorporating experimental uncertainties into the cost function, the resulting perturbations represent physically possible data variations, unlike standard attacks that may generate non-physical artifacts.
Robustness Assessment: The ability to scan the attack parameter $s$ allows researchers to quantify how much experimental uncertainty a classification can withstand before failing. This serves as a metric for the reliability of individual event classifications.
Misclassification Detection: The method consistently distinguishes between correctly and incorrectly classified events; misclassified events are "flippable" with smaller uncertainty scales than robust, correct classifications.
General Applicability: The algorithm is not limited to a specific detector or data type, as demonstrated by its success on image data (MNIST), time-series/2D detector data (IceCube), and high-dimensional kinematic data (CMS).

The authors note that while the current implementation is slower than gradient-based attacks, it offers a unique tool for validating physics analyses. They suggest future work should focus on applying more complex, realistic uncertainty models and exploring the integration of this method into the training process (adversarial training), though current computational costs limit this to smaller datasets.

MiniFool -- Physics-Constraint-Aware Minimizer-Based Adversarial Attacks in Deep Neural Networks