\textsc{RooAgent}: An LLM Agent for \textsc{Root}-Based… — Plain-Language Explanation

Imagine you have a massive, incredibly complex library of scientific data. In the world of particle physics, this library is called Root, and it contains the "receipts" of billions of particle collisions. To find a specific piece of information—like a specific type of particle or a pattern in the data—you usually need to be a librarian who speaks a very difficult, technical language (programming code). If you don't know the exact code, you can't check out the book.

RooAgent is like hiring a super-smart, multilingual librarian assistant who speaks your language (plain English) and knows the library's secret code perfectly.

Here is how it works, broken down into simple concepts:

1. The Problem: The "Foreign Language" Barrier

High-energy physicists use a tool called PyRoot to analyze data. It's powerful, but it's like trying to order a complex meal at a restaurant where the menu is written in a language you don't speak. You have to know the exact syntax to ask for "a histogram of electron momentum" or "a count of events where jets are heavy." If you make a typo or use the wrong word, the computer just says "Error."

2. The Solution: The "Translator" Agent

RooAgent acts as a translator. You don't need to learn the code. You just tell the agent what you want in plain English, like:

"Show me a graph of the mass of the bottom quarks."
"Count how many events happen if I only look at particles moving faster than 50 GeV."
"Find the best cut to separate the signal from the background noise."

The agent (powered by a Large Language Model, or LLM) listens to your request, translates it into the correct technical commands, runs the analysis, and hands you back the result—usually a graph, a table of numbers, or a summary.

3. How It Works: The "Toolbox"

Think of the agent as a construction worker with a specific toolbox. The paper describes two ways this worker can be hired:

The LangGraph Mode: The worker uses a "foreman" (LangGraph) to manage a team of AI models (like GPT-4.1 or DeepSeek-V3). The foreman breaks your big request into small steps, asks the AI to pick the right tool, and then executes it.
The MCP Mode: The worker talks directly to a different AI boss (Anthropic's Claude) using a standard protocol (Model Context Protocol).

In both cases, the "tools" in the toolbox are pre-written computer functions that do the heavy lifting:

Inspecting: Looking inside the data files to see what's there.
Counting: Tallying up how many events pass a specific rule.
Plotting: Drawing the graphs and charts.
Fitting: Drawing a smooth curve through the data points to see the shape.
Calculating: Doing the math to see if a discovery is statistically significant.

4. The "Test Drive"

The authors tested this assistant with several scenarios to see if it could handle the job:

The "ZH" Simulation: They simulated a specific particle collision (a Z boson and a Higgs boson). The agent successfully found the files, drew the graphs, counted the events, and even found the "sweet spot" (the best cut) to separate the signal from the background noise.
The "Multi-Task" Challenge: They gave the agent one long, complex instruction to do six different things at once (fit a curve, make comparison charts, run a cut-flow, optimize cuts, scan mass windows, and rank results). The agent did all six steps in a row without needing human help.
The "Toy" Statistical Test: They created a fake dataset with a hidden signal. The agent successfully scanned through different mass values, found the hidden signal at the right spot (250 GeV), and calculated the probability that it wasn't just a fluke.
The "Real World" Test: They used real, public data from the ATLAS experiment at CERN (the Large Hadron Collider). The agent successfully analyzed the data for a Higgs boson decaying into four leptons, producing a stacked graph that matched what human experts would produce.

5. The Result

The paper claims that RooAgent works. It successfully turned plain English questions into complex physics answers.

It handled 19 out of 20 single-task tests correctly.
It completed a 6-step multi-task workflow without stopping.
It produced the same numerical results whether it was using OpenAI's GPT-4.1 or Anthropic's Sonnet 4.6.

The Catch:
The agent isn't perfect. In one test, it got confused because the user typed "Events" (capital E) instead of "events" (lowercase e) for the file name. The agent stopped and asked for clarification rather than guessing. Also, sometimes different AI models might choose slightly different ranges for a graph (e.g., showing 0–100 GeV vs. 0–200 GeV), but the core math remains the same.

Summary

RooAgent is a bridge. It lets physicists (and potentially students or new researchers) talk to their data in human language, while the computer handles the complex, technical language required to actually do the analysis. It doesn't replace the physicist's understanding of the physics, but it removes the barrier of having to memorize complex code syntax to get the job done.

Technical Summary of "RooAgent: An LLM Agent for Root-Based High Energy Physics Analysis"

Problem Statement
High Energy Physics (HEP) data analysis relies heavily on the ROOT framework and its Python interface, PyROOT, for tasks ranging from event selection and histogramming to statistical inference. However, utilizing these tools requires significant familiarity with specific API conventions, data structures (such as TTree branches), and the internal organization of input samples. This barrier to entry can hinder new users and make routine tasks inefficient. While Large Language Models (LLMs) have shown promise in automating multi-step workflows via "tool calls," there is a need for a specialized interface that maps natural-language goals directly to the specific function calls required for ROOT-based analysis.

Methodology
The authors present RooAgent, a Python package that acts as a natural-language interface for ROOT-based analysis. The system wraps PyROOT functions as executable tools for an LLM agent. The architecture supports two distinct operating modes, both utilizing the same underlying PyROOT implementation:

LangGraph Agent Mode: Compatible with OpenAI's GPT-4.1 (via GitHub Copilot) and DeepSeek-V3 (via Ollama). In this mode, the LLM reasons over user prompts, selects tools, constructs arguments, and iteratively calls PyROOT functions until the user's goal is met.
Model Context Protocol (MCP) Mode: Designed for integration with the Anthropic Claude CLI (specifically tested with Sonnet 4.6). This mode operates as an MCP server, where the Claude CLI acts as both the LLM and the orchestration layer, eliminating the need for LangChain or LangGraph dependencies.

The tool set is modular and covers the full spectrum of common ROOT analysis tasks, including:

Inspection: Listing file contents, TTree structures, and branch data types.
Counting & Selection: Applying boolean cuts, generating cutflows, and computing event yields.
Histograms & Statistics: Filling histograms from TTree branches, calculating integrals, means, and RMS, and computing significance ( $S/\sqrt{S+B}$ ).
Visualization: Generating 1D and 2D plots, overlaying distributions, and applying log scales.
Fitting: Performing Gaussian, exponential, or polynomial fits to distributions.
Optimization: Scanning cut thresholds to maximize significance.
Export: Converting TTree branches to CSV files.

The system is designed for iterative reasoning, allowing the agent to call tools multiple times to refine results or correct errors (e.g., clarifying tree names or adjusting plot ranges).

Key Contributions

Unified Interface: RooAgent provides a consistent set of analysis tools accessible via natural language across different LLM backends (OpenAI, Ollama, Anthropic) without requiring changes to the underlying analysis code.
Tool Registry: The package exposes a comprehensive library of PyROOT-wrapped functions specifically tailored for HEP workflows, including significance calculation, cutflow generation, and parametric fitting.
Dual-Mode Architecture: By supporting both a LangGraph-based agent and an MCP server, the package offers flexibility for users preferring different LLM ecosystems and deployment methods (local vs. cloud).

Results
The authors evaluated RooAgent using Monte Carlo simulations of $pp \to ZH$ ( $Z \to \ell^+\ell^-, H \to b\bar{b}$ ) and background processes, as well as ATLAS open data for the $H \to ZZ^* \to 4\ell$ channel.

Benchmark Performance: In a series of 20 single-task tests, the agent successfully produced results for 19 tasks. Tasks included file inspection, histogram plotting, event counting, variable definition, Gaussian fitting, and significance scanning. One failure occurred due to a case-sensitivity issue in a tree name lookup, which the agent correctly identified and flagged for clarification rather than producing a false result.
Multi-Task Workflow: A complex prompt requiring six sequential tasks (fitting, kinematic comparisons, cutflow generation, cut optimization, mass-window scanning, and cut ranking) was executed successfully in approximately 225 seconds without human intervention.
Statistical Analysis: In a toy statistical analysis involving a grid of mass hypotheses, the agent correctly matched histograms, computed observed and expected significances, p-values, and $CL_s$ values, and identified the injected signal mass (250 GeV) as the strongest candidate.
Open Data Application: Applied to ATLAS open data, the agent successfully processed multiple ROOT files, applied sequential lepton selection cuts, generated cutflows, and produced a stacked plot of signal and background overlaid with data. The results were consistent across GPT-4.1 and Sonnet 4.6.
Model Variations: The paper notes that while the core logic remains consistent, different LLMs (e.g., GPT-4.1 vs. DeepSeek-V3) may make different choices regarding plot ranges or normalization when not explicitly constrained, highlighting the importance of prompt specificity.

Significance
The paper claims that RooAgent successfully bridges the gap between plain-language prompts and the technical requirements of ROOT-based HEP analysis. By automating the selection of tools and arguments, the system streamlines routine tasks and lowers the barrier to entry for users unfamiliar with the intricacies of the ROOT API. The authors position the work as a step toward more accessible HEP data analysis, demonstrating that LLM agents can effectively orchestrate complex, multi-step workflows involving file inspection, statistical inference, and visualization. The package is modular, allowing for future extensions such as the integration of machine learning algorithms as callable tools or the identification of optimal event-selection variables.

\textsc{RooAgent}: An LLM Agent for \textsc{Root}-Based High Energy Physics Analysis