Optimised neural networks for online processing of… — Plain-Language Explanation

Original authors: Georges Aad, Raphael Bertrand, Lauri Laatu, Emmanuel Monnier, Arno Straessner, Nairit Sur, Johann C. Voigt

Published 2026-02-06

📖 5 min read🧠 Deep dive

View on arXiv ↗PDF ↗

CC BY 4.0

Original authors: Georges Aad, Raphael Bertrand, Lauri Laatu, Emmanuel Monnier, Arno Straessner, Nairit Sur, Johann C. Voigt

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine the ATLAS detector at the Large Hadron Collider (LHC) as a giant, ultra-sensitive microphone listening to the universe. Every 25 nanoseconds, two beams of protons crash into each other, creating a chaotic symphony of particles. The "microphone" (specifically, the liquid-argon calorimeter) tries to measure the energy of these particles by listening to the electrical "pulses" they create.

However, there is a problem: the orchestra is getting louder and more crowded. In the future upgrade (called the HL-LHC), there will be so many collisions happening at once (a phenomenon called "pile-up") that the signals overlap like a messy pile of tangled headphones. The current method for untangling these signals (called "Optimal Filtering") is like trying to hear a single violin in a rock concert using a very old, slow ear—it gets confused and misses the true volume of the sound.

This paper presents a new solution: teaching the detector's brain to think like a modern AI.

Here is the breakdown of what they did, using simple analogies:

1. The Challenge: A Tiny, Fast Brain

The detector doesn't have a supercomputer to process data. It has to make decisions instantly, right where the data is collected, using specialized chips called FPGAs (Field-Programmable Gate Arrays). Think of these FPGAs as tiny, ultra-fast calculators that have very strict rules:

Speed: They must decide the energy of a particle in less than the time it takes a hummingbird to flap its wings (125 nanoseconds).
Size: They have very little memory space. You can't install a massive, heavy software program on them.

2. The Solution: New Neural Network "Recipes"

The researchers tried teaching these tiny calculators to recognize the messy signals using Neural Networks (AI models). They tested four different "recipes" (architectures) to see which one could untangle the noise best without breaking the speed or size limits:

The RNN (Recurrent Neural Network): Imagine a person reading a story one word at a time, remembering the previous word to understand the current one. This is good for sequences, but in this crowded environment, it got too big and slow.
The CNN (Convolutional Neural Network): Imagine looking at a pattern through a sliding window, like a security camera scanning a hallway. It looks at a chunk of the signal at a time to find shapes. This worked very well.
The Dense Network: Imagine a team of experts where everyone talks to everyone else to solve a puzzle. This also worked very well.
The "Dense + RNN" Hybrid: A mix of the two, trying to get the best of both worlds.

3. The Tuning Process: The "Smart Search"

The researchers didn't just guess which recipe was best. They used a Bayesian Optimization process.

The Analogy: Imagine you are trying to find the perfect temperature to bake a cake, but you only have a few tries before the oven breaks. You don't just guess randomly; you use a smart assistant that says, "Okay, we tried 180°C and it was too dry. Let's try 190°C, but maybe a little less flour."
They used this "smart assistant" to balance two competing goals: Accuracy (getting the energy right) vs. Size (keeping the code small enough for the chip). They found a "sweet spot" where the AI was small enough to fit but smart enough to beat the old method.

4. The Results: A Clearer Picture

When they tested these new AI models against the old "Optimal Filtering" method:

Better Accuracy: The new AI models (Dense and CNN) could measure the energy with a precision of about 80 MeV (a very small unit of energy). The old method and the RNN were less precise (around 90 MeV).
No More Underestimating: The old method tended to "turn down the volume" on the signals, thinking the energy was lower than it actually was. The new AI models got the volume right.
Efficiency: The winning models were tiny (using fewer than 500 "math operations"), proving they could fit on the hardware.

5. The Bonus Feature: "How Sure Are You?"

Usually, AI gives you an answer but no confidence score. It's like a weather app saying "It will rain" without telling you if it's a 50% chance or a 99% chance.

The researchers added a special technique called Deep Evidential Regression.
The Analogy: This is like giving the AI a "confidence meter." Now, when the AI says, "This particle has 50 GeV of energy," it can also say, "I am 95% sure of this," or "I'm a bit fuzzy on this one because the noise was weird."
They found that this confidence meter was accurate. It didn't make the AI slower or bigger, but it gave scientists a way to know which measurements were trustworthy.

Summary

The paper shows that by using smart, tiny AI models (specifically Dense and CNN networks) tuned with a "smart search" method, the ATLAS detector can be upgraded to handle the chaos of future high-energy collisions. These new models are faster, more accurate, and can even tell scientists how confident they should be in the data, all while fitting inside the tiny, fast chips on the detector itself.

Technical Summary: Optimised Neural Networks for Online ATLAS Calorimeter Data Processing

Problem Statement
The High-Luminosity Large Hadron Collider (HL-LHC) will introduce extreme signal pile-up, with up to 200 simultaneous proton-proton collisions per bunch crossing. This environment degrades the performance of the current Optimal Filtering (OF) algorithm used in the ATLAS Liquid-Arson (LAr) calorimeters, particularly in reconstructing energy when pulses overlap. The Phase-II upgrade of the LAr readout electronics introduces new hardware based on INTEL Agilex 7 Field-Programmable Gate Arrays (FPGAs). These FPGAs offer increased processing power but impose strict constraints on latency (below 125 ns) and network size (limited to approximately 500 multiply-accumulate operations, or MACs, per cell) for online energy reconstruction. The challenge is to develop neural network (NN) architectures that outperform the OF algorithm in energy resolution under high pile-up while adhering to these severe hardware constraints and providing reliable per-event uncertainty estimates.

Methodology
The study evaluates four neural network architectures designed to predict the transverse energy deposited in a calorimeter cell using digitised pulse samples as input. The input data includes pre-deposit samples (to account for pulse distortions from previous collisions) and post-deposit samples (to capture the pulse shape of the target energy deposit).

Architectures Evaluated:
- Recurrent Neural Network (RNN): Processes samples sequentially. While efficient for time-series data, standard RNNs require large internal dimensions to capture long-range dependencies, often exceeding FPGA resource limits for long sequences.
- Convolutional Neural Network (CNN): Utilises sliding 1D and 2D filters over the input samples. It leverages weight sharing and reuses computations from previous bunch crossings to reduce latency.
- Dense+RNN: A hybrid approach where a dense layer processes pre-deposit samples to initialise an RNN sequence for post-deposit samples, aiming to balance RNN advantages with reduced computational cost.
- Staged Dense: A multi-stage architecture using only dense layers. Pre-deposit samples are processed in a first stage to correct for distortions, which are then combined with post-deposit samples in a second stage. This allows pre-computation of the first stage, minimising latency.
Optimisation Strategy:
A Bayesian optimisation procedure was employed to tune hyperparameters (e.g., number of pre/post-deposit samples, layer dimensions, kernel sizes). The objective function balanced energy resolution against network size (MAC count), applying penalties for architectures exceeding 500 MACs and severe penalties beyond 850 MACs to ensure FPGA feasibility.
Uncertainty Estimation:
To address the need for per-event energy uncertainties without the computational cost of Bayesian Neural Networks (which require sampling), the authors implemented Deep Evidential Regression (DER). This technique modifies the final layer of the Dense network to output parameters of a Normal-Inverse-Gamma distribution, allowing the inference of both the predicted energy and its associated aleatoric (data noise) and epistemic (model uncertainty) uncertainties.
Simulation and Training:
The networks were trained and tested on simulated data using the AREUS toolkit, simulating a worst-case pile-up scenario ( $\langle\mu\rangle = 200$ ) with hard-scattering events ranging from 0 to 130 GeV. A dataset of 13 million events was used for final evaluation to minimise statistical fluctuations.

Key Results

Energy Resolution: The optimised Dense, CNN, and Dense+RNN architectures achieved a transverse energy resolution of approximately 80 MeV. This outperforms both the current OF algorithm and the RNN architecture (which achieved ~90 MeV).
Energy Scale Accuracy: Unlike the OF algorithm and standard RNNs, which systematically underestimate the energy (the OF ignores in-time pile-up, and RNNs fail to capture long-range dependencies with limited inputs), the Dense, CNN, and Dense+RNN networks accurately reproduce the energy scale across the full dynamic range.
Hardware Feasibility: All successful architectures (Dense, CNN, Dense+RNN) were optimised to use fewer than 500 MAC units, making them suitable for implementation on the Agilex 7 FPGAs within the strict latency constraints.
Uncertainty Performance: The DER implementation added minimal computational overhead. The predicted uncertainty ( $\delta_{pred}$ ) was found to be consistent, on average, with the actual difference between the true and predicted energy. The pull distribution $(E_{pred} - E_{true})/\delta_{pred}$ yielded a standard deviation of 0.75, indicating a slight overestimation of uncertainty but overall reliability. The analysis showed that epistemic uncertainty dominates, suggesting potential for further improvement with larger datasets or refined architectures.

Significance and Claims
The paper claims to demonstrate that modern machine learning algorithms can be successfully embedded into the online readout chain of the ATLAS LAr calorimeters. The primary significance lies in the successful trade-off between resolution and hardware constraints:

The study proves that Dense and CNN architectures can improve energy resolution by approximately 8% compared to the legacy OF method while remaining within the strict MAC limits of the Phase-II FPGA hardware.
It establishes that pre-deposit samples are critical for capturing pulse distortions, rendering pure RNN approaches less competitive due to their resource intensity for long sequences.
It introduces a practical method for per-event uncertainty estimation via Deep Evidential Regression, which does not significantly increase inference costs. This capability is presented as a step toward improved cell energy selection in clustering algorithms, allowing for more accurate reconstruction of physics objects like electrons and photons in high-pile-up environments.

The authors conclude that these optimised networks are well-suited for FPGA deployment and represent a viable path forward for the ATLAS Phase-II upgrade, offering superior performance over current algorithms without compromising the stringent latency and resource requirements of the trigger and readout systems.

Optimised neural networks for online processing of ATLAS calorimeter data on FPGAs