Agentic Diagrammatica: Towards Autonomous Symbolic… — Plain-Language Explanation

✨

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to build a complex machine, like a rocket, but instead of a human engineer, you hire a brilliant, hyper-fast robot that has read every book in the library. This robot, an AI Agent, is great at planning and talking, but when it comes to doing the actual math to ensure the rocket doesn't explode, it has a nasty habit of making "silent mistakes." It might drop a minus sign, mix up a unit of measurement, or use a rule that works in one textbook but not another. In physics, these tiny errors can lead to completely wrong predictions.

This paper introduces Diagrammatica, a new "safety harness" and "toolbelt" designed to help this AI robot do high-level physics calculations without crashing.

Here is the breakdown using simple analogies:

1. The Problem: The "Smart but Squirrelly" Robot

The authors explain that Large Language Models (LLMs) are like brilliant improvisational actors. They can write a script, act out a scene, and sound very convincing. But if you ask them to do a long, multi-step math problem (like calculating how a particle decays), they tend to "hallucinate" the rules.

The Issue: Physics relies on strict, hidden rules (conventions). For example, is a specific number positive or negative? Does a particle spin clockwise or counter-clockwise? The AI might get the first step right, but by step 10, it might have forgotten the rule it used in step 1.
The Result: The AI produces a result that looks right and sounds smart, but is actually wrong. Checking this work is like trying to find a single typo in a 500-page novel written by a machine; it's incredibly hard.

2. The Solution: The "Blueprint & Builder" System

Instead of letting the AI write the math code from scratch (which is like asking the actor to build the rocket engine while acting), Diagrammatica changes the game.

The Agent becomes the Architect: The AI is only allowed to draw a diagram (a blueprint) describing what it wants to calculate. It picks from a menu of valid options (e.g., "Scalar particle," "Fermion," "Vector boson"). It cannot write the math equations itself.
The Backend becomes the Builder: Once the AI draws the blueprint, a trusted, rigid computer program (the "Builder") takes that blueprint and does the actual math. This Builder knows the rules perfectly and never makes a mistake.

The Analogy: Imagine you want to order a custom pizza.

Old Way: You tell the chef, "Make me a pizza with cheese, pepperoni, and... uh, maybe some math on the side?" The chef tries to guess the recipe and might burn the crust.
Diagrammatica Way: You fill out a strict order form with checkboxes: [ ] Cheese, [ ] Pepperoni, [ ] Crust Type. You hand the form to the kitchen. The kitchen (the trusted backend) follows the form exactly. You can't order "math on the side," so you can't make a mistake.

3. The Two "Flavors" of Calculation

The toolkit offers two ways to get the answer, depending on how precise you need to be:

NDA (The "Back-of-the-Napkin" Estimate): This is like a quick guess. The AI asks, "Roughly how big is this pizza?" The system uses simple rules of thumb to give an order-of-magnitude answer. It's fast and works for very complex pizzas (processes) that are too hard to measure exactly.
EDA (The "Exact Recipe"): This is the high-precision mode. The system generates the exact mathematical formula, like a professional chef measuring every gram of flour. It produces a perfect, symbolic answer that can be used for real scientific papers.

4. The "Knowledge Librarian"

Sometimes the AI gets stuck on a specific rule (e.g., "Which sign do I use for this particle?"). Instead of dumping a whole textbook into the AI's brain (which makes it confused), Diagrammatica has a Librarian.

When the AI asks a specific question, the Librarian hands it just the one page it needs, right at that moment. This keeps the AI focused and prevents it from getting overwhelmed by too much information.

5. The Proof: Two Big Tests

The authors tested this system with two massive challenges to prove it works:

Test 1: The Encyclopedia of Decays. They asked the AI to calculate the decay rates for every possible combination of particles (like a parent particle splitting into two children) across the entire Standard Model.
- Result: The AI successfully generated 19 different complex formulas, checked them against known real-world data, and even found interesting patterns in the physics. It did this without a human touching the keyboard.
Test 2: The Muon Multiplicity Challenge. They asked the AI to figure out how many pairs of electrons and positrons a muon (a heavy cousin of the electron) can spit out before the event becomes too rare to see in future experiments.
- Result: The AI had to sort through 150,000 different possible diagrams. It used the "Back-of-the-Napkin" method to quickly rule out the impossible ones and the "Exact Recipe" method to confirm the most likely ones. It successfully mapped out the limits of future experiments.

Why This Matters

This paper is a blueprint for the future of scientific discovery. It shows that we don't need AI to replace human scientists or to be perfect at math on its own. Instead, we can build AI assistants that are constrained by safety rails.

By forcing the AI to use "checkboxes" and "diagrams" instead of free-form writing, we get the speed and creativity of AI, combined with the 100% reliability of traditional computers. It's like giving a super-fast race car a GPS and a safety cage: it can go faster and further than ever before, but it won't crash off the cliff.

1. Problem Statement

The paper addresses the critical challenge of reliability when using Large Language Models (LLMs) for symbolic computation in High Energy Physics (HEP). While LLMs are capable of generating code and performing mathematical manipulations, they struggle with the implicit mathematical conventions required for exact physics calculations (e.g., metric signatures, spinor normalizations, gauge choices, and phase conventions).

The Bottleneck: In HEP, a correct calculation requires dozens of chained symbolic operations where every step must adhere to a consistent set of conventions. Autoregressive generation offers no guarantee that these conventions remain consistent across a multi-step workflow.
The Risk: Without external structure, LLMs can produce "plausible but silently wrong" results (e.g., dropped terms, sign errors, or inconsistent identities) that are difficult to audit because the errors are not syntax violations but semantic deviations from physical laws.
The Limitation of Current Approaches: Existing agentic frameworks often rely on the agent writing free-form scripts. This places the entire burden of correctness on the model's ability to retrieve and apply the right convention from a long context, leading to high execution uncertainty.

2. Methodology: The Diagrammatica Architecture

The authors propose Diagrammatica, a symbolic computation extension to the HEPTAPOD agentic framework. The core philosophy is Tool-Constrained Computation, which shifts the locus of reliability from the LLM's generative capabilities to the structure of the tools it invokes.

A. Entropy Decomposition & Architectural Strategy

The authors decompose the uncertainty of an agent's action distribution ( $\Delta_A$ ) into three components:

Task Uncertainty ( $\Delta_T$ ): Ambiguity in what to compute.
Context Uncertainty ( $\Delta_C$ ): Degradation of attention/retrieval as context grows.
Execution Uncertainty ( $\Delta_E$ ): Variability in how the calculation is performed (token-level errors).

The Solution:

Tool Constraints (Primary): Instead of asking the LLM to generate code, the action space is restricted to schema-validated tool calls. The agent specifies a compact, human-auditable diagram specification (JSON), and a trusted backend performs the exact algebra. This structurally drives $\Delta_E \to 0$ by eliminating silent error modes.
Targeted Knowledge Grounding (Secondary): A navigable theory knowledge base (skills graph) provides domain-specific conventions on demand at the moment of a critical decision, rather than loading bulk documentation, thereby suppressing $\Delta_C$ and partially $\Delta_E$ .

B. Core Components

Shared Diagram Specification: A unified, LLM-compatible JSON data structure that defines a physics process (particles, spins, masses, vertex types). It serves as the "single source of truth" for all toolkit components.
- Numerical Mode (NDA): Uses explicit floating-point values for immediate rate estimation.
- Symbolic Mode (EDA): Uses unresolved parameters for deriving general analytic formulas.
Two Fidelity Calculation Paths:
- NDA (Naive Dimensional Analysis): Provides order-of-magnitude estimates for decay widths and cross-sections using dimensional analysis, phase-space volumes, and coupling power counting. It requires no external software and handles arbitrary $n$ -body final states.
- EDA (Exact Diagrammatic Analysis): Generates complete, self-contained FeynCalc (Mathematica) scripts from the diagram specification. It performs exact tree-level symbolic calculations, including Dirac traces, spin sums, and kinematic simplification.
FeynGraph Integration: An automated enumeration engine (Rust/Python) that generates all topologically distinct Feynman diagrams for a given process, ranking them by physics importance (e.g., heavy propagator suppression).
Theory Knowledge Base: A structured graph of QFT documents (Feynman rules, trace identities, etc.) accessible via a LookupTheory tool to resolve specific ambiguities during execution.

3. Key Contributions

Paradigm Shift: Moves agentic HEP workflows from "free-form code generation" to "schema-constrained tool invocation," ensuring correctness by construction rather than by model probability.
Unified Interface: Introduces a single diagram specification format that feeds both approximate (NDA) and exact (EDA) calculation engines, enabling seamless multi-fidelity cross-checks.
Autonomous Workflow: Demonstrates that an LLM agent can plan, execute, and validate complex multi-step theoretical calculations without human intervention, relying on structured tools for mechanical steps and free-form reasoning for strategy.
Open Source Toolkit: Releases the full code, including the tools, knowledge base, and benchmark logs, to serve as a reproducible testbed for other agentic platforms.

4. Results & Benchmarks

The architecture was validated on two autonomous tasks using a single session of the Claude Code agent:

Task 1: Exhaustive $1 \to 2$ Decay Rate Catalog

Goal: Compute symbolic partial decay widths for all tree-level, single-vertex $1 \to 2$ processes across scalar, fermion, and vector parents.
Execution: The agent enumerated 19 independent processes across 6 vertex families. It generated 19 FeynCalc scripts, executed them, and performed systematic validation.
Validation:
- Chiral Cross-checks: Verified algebraic consistency between vector-axial and chiral parameterizations.
- Standard Model Validation: Compared results against known PDG partial widths (e.g., $H \to b\bar{b}$ , $Z \to e^+e^-$ , $t \to Wb$ ). Results agreed within 2–4%, consistent with tree-level expectations.
- NDA Cross-check: Confirmed that exact results matched order-of-magnitude NDA estimates.
Outcome: The agent successfully derived 19 new reference formulas, identified physical patterns (e.g., threshold power laws, CP discriminants), and produced a publication-ready reference document.

Task 2: Muon Decay Multiplicity Sensitivity Study

Goal: Determine the maximum observable number of $e^+e^-$ pairs ( $n$ ) in the decay $\mu^+ \to \bar{\nu}_\mu \nu_e + n(e^+e^-) + e^+$ at current and planned experiments (Mu3e, HiMB).
Execution: The agent used FeynGraph to enumerate diagrams for $n=0$ to $n=3$ , resulting in 150,541 total tree-level diagrams. It used NDA to estimate branching ratios and MadGraph for exact cross-checks on dominant topologies.
Findings:
- Scaling: Identified that the dominant class (single $W$ propagator) shrinks rapidly in fraction as $n$ increases, but subleading topologies are suppressed by $\sim 10^{-13}$ per heavy propagator.
- Interference: NDA (incoherent sum) overestimated rates at $n=2$ due to destructive interference, which MadGraph (coherent sum) corrected.
- Experimental Frontier: Concluded that $n=3$ is observable at Mu3e Phase I ( $\sim 250$ events), while $n=4$ is the absolute upper limit of observability, potentially accessible by HiMB.
Outcome: The agent mapped the experimental frontier, validated NDA scaling against MadGraph, and provided a comprehensive sensitivity analysis.

5. Significance

Reliability over Capability: The paper argues that for scientific computation, reliability is more critical than raw generative capability. By constraining the action space to physically meaningful choices, the system eliminates entire classes of silent errors independent of the model's training data.
Interpretability: Decisions are compressed into human-readable JSON fields (diagram specifications) rather than pages of generated code, making it easy for domain experts to audit the physics logic.
Scalability: The multi-fidelity approach (NDA for triage, EDA for precision) allows agents to tackle problems of arbitrary complexity, from simple decays to high-multiplicity processes where exact analytic integration is intractable.
Future of AI in HEP: This work provides a blueprint for autonomous scientific discovery, where AI agents can not only simulate data but also derive theoretical predictions, validate them against known physics, and propose new experimental frontiers with rigorous error control.

In summary, Diagrammatica demonstrates that by embedding domain conventions into the tool interface rather than relying on the LLM to recall them, it is possible to build autonomous agents capable of performing reliable, complex symbolic computations in high-energy physics.

Agentic Diagrammatica: Towards Autonomous Symbolic Computation in High Energy Physics