Development of an LLM-Based System for Automatic Code Generation from HEP Publications

This paper presents a proof-of-concept system that utilizes open-weight large language models to extract analysis procedures from high-energy physics publications and generate executable code for reproducing results, demonstrating promising potential as human-in-the-loop tools while highlighting current limitations such as hallucination and execution failures.

Original authors: Masahiko Saito, Tomoe Kishimoto, Junichi Tanaka

Published 2026-04-17
📖 5 min read🧠 Deep dive

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to recreate a famous, delicious cake that a master baker published in a magazine. The recipe isn't just a simple list of ingredients; it's a long, complex article that says things like, "Use the flour described in the 2015 baking guide" or "Whisk until the texture matches the description in the 2018 journal."

To make this cake, you need to:

  1. Read the magazine article and all the other books it references.
  2. Write down a clear, step-by-step shopping list and instruction manual.
  3. Actually bake the cake in your kitchen and see if it tastes exactly like the one in the picture.

This paper is about building a super-smart robot assistant (an AI) to do this for scientists in High Energy Physics (HEP). These scientists study the smallest particles in the universe, and their "recipes" are incredibly complex computer programs written in scientific papers.

Here is how the authors tried to teach this robot to work, broken down into simple concepts:

The Problem: The "Black Box" of Science

In the past, if a scientist wanted to check if a famous experiment was done correctly, they had to read a 20-page paper and manually rewrite the computer code from scratch. It's like trying to rebuild a Ferrari engine just by reading a magazine article about it. It takes years, and it's easy to make a mistake.

The authors wanted to use Large Language Models (LLMs)—the same type of AI that writes emails or chats with you—to read these papers and automatically write the computer code needed to recreate the experiment.

The Solution: A Two-Step "Translator" Robot

The authors realized that asking an AI to "Read this paper and write the code" is like asking a child to translate a novel into a movie script in one go. It's too much pressure, and the AI might start making things up (a problem called "hallucination").

Instead, they built a two-stage assembly line:

Stage 1: The "Note-Taker" (Extraction)

First, the AI acts like a very organized student. It reads the main paper and all the other papers it mentions. Its job is to pull out the specific rules (like "only keep particles that are heavier than X") and write them down in a neat, structured list.

  • The Analogy: Imagine the AI is a detective reading a mystery novel. Instead of writing the whole story, it just writes down a list of clues: "The butler was in the library," "The candle was blue," etc.
  • The Result: The AI got pretty good at finding the clues, especially the bigger, smarter models. However, sometimes it got confused or invented clues that weren't there.

Stage 2: The "Chef" (Code Generation)

Once the AI has the neat list of rules, it moves to the second stage. Now, it acts like a chef trying to cook the dish based only on that list. It writes the actual computer code, runs it, and checks if the result matches the original experiment.

  • The Analogy: The AI takes the detective's list of clues and tries to build a Lego castle that looks exactly like the one in the photo.
  • The Result: Sometimes, the AI built a perfect castle. But often, it built a wobbly tower that fell over, or a castle that looked right but had the wrong number of windows.

The Big Challenges

The authors found three main things that make this robot not quite ready for prime time:

  1. The "Daydreaming" Problem (Hallucination): Sometimes the AI is so confident that it invents facts. It might say, "The paper said to use a red hammer," when the paper actually said "blue." In science, a red hammer ruins the whole experiment.
  2. The "Mood Swing" Problem (Stochasticity): If you ask the AI to do the same task twice, it might give you two different answers. One time it gets it right; the next time, it fails. This makes it hard to trust.
  3. The "Running Out of Breath" Problem: The papers are so long and complex that the AI sometimes forgets the beginning of the sentence by the time it gets to the end.

The Verdict: A Helpful Assistant, Not a Boss

The authors conclude that these AI robots are not yet ready to work alone. You cannot just let them run the experiment and hope for the best.

However, they are amazing "Co-Pilots."

  • The Human-in-the-Loop: The best way to use this system is for a human scientist to sit next to the robot. The robot does the heavy lifting (reading 50 pages and writing 100 lines of code), and the human checks the work.
  • The Safety Net: If the robot makes a mistake, the human catches it. If the robot gets stuck, the human helps it out.

Why This Matters

If this system gets better, it could change science forever. It would mean that:

  • New students could understand complex physics papers much faster.
  • Old experiments could be re-checked easily to make sure no mistakes were made years ago.
  • Science becomes more transparent, because the "recipe" is automatically checked against the "dish."

In short, the authors built a prototype robot that can read science papers and try to recreate the experiments. It's not perfect yet—it still daydreams and makes mistakes—but with a human friend watching over its shoulder, it's a powerful tool for making science more reliable and accessible.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →