Generalizable Foundation Models for Calorimetry via Mixtures-of-Experts and Parameter Efficient Fine Tuning

This paper introduces a generalizable foundation model for calorimetry that leverages next-token transformer architectures combined with Mixture-of-Experts pre-training and parameter-efficient fine-tuning to enable modular, scalable, and computationally efficient simulation of particle showers across diverse materials and detector configurations without catastrophic forgetting.

Original authors: Carlos Cardona-Giraldo, Cristiano Fanelli, James Giroux, Cole Granger, Benjamin Nachman, Gerald Sabin

Published 2026-04-01
📖 4 min read🧠 Deep dive

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are an architect trying to design the perfect building. To do this, you need to know exactly how the building will react to wind, rain, and earthquakes. In the world of particle physics, scientists are building "detectors" (giant cameras) to catch subatomic particles. To design these detectors, they need to run millions of computer simulations to see how particles crash into the detector's materials.

Traditionally, this simulation process is like trying to simulate every single raindrop hitting a roof using a supercomputer. It's incredibly accurate, but it takes so long and uses so much energy that it's becoming impossible to keep up with the demands of modern science.

This paper introduces a new, smarter way to do this using Artificial Intelligence, specifically a type of "Foundation Model" (a super-smart AI brain) designed for calorimetry (measuring particle energy). Here is how they did it, explained through simple analogies:

1. The Problem: The "One-Size-Fits-None" Dilemma

Imagine you have a master chef who is a genius at cooking steak. If you ask them to cook a fish, they might struggle because they've never done it. If you try to retrain them to cook fish, you might accidentally make them forget how to cook steak perfectly. This is called "catastrophic forgetting."

In physics, if you train a simulation AI on Tungsten (a heavy metal used in detectors), and then you want to simulate Lead, you usually have to start from scratch or risk ruining the Tungsten knowledge.

2. The Solution: The "Modular Kitchen" (Mixtures-of-Experts)

The authors built an AI that acts like a Master Chef with a team of specialized sous-chefs.

  • The Base Model (The Master Chef): This is the core AI, trained on a "foundation" of physics. It knows the general rules of how energy moves and how particles interact. This part of the brain is frozen (locked in place) so it never forgets what it already knows.
  • The Experts (The Sous-Chefs): Attached to the Master Chef are specialized modules called "Experts."
    • One expert specializes in Tungsten.
    • Another specializes in Tantalum.
    • A new one can be added for Lead.

When the AI needs to simulate a particle hitting Tungsten, it asks the "Tungsten Expert." When it needs to simulate Lead, it asks the "Lead Expert." The Master Chef stays the same; only the specific expert changes. This means you can add new materials without ever messing up the old ones.

3. Handling New Ingredients: "Low-Rank Adaptation" (LoRA)

What if you want to change the type of food being cooked? For example, switching from cooking Photons (light particles) to Electrons (charged particles). The rules of the kitchen change significantly.

Instead of firing the Master Chef and hiring a new one, the authors use a technique called LoRA (Low-Rank Adaptation). Think of this as giving the Master Chef a specialized apron and a new set of tools.

  • The core brain (the chef's knowledge) stays the same.
  • The apron (LoRA) adjusts how the chef thinks about the specific task (e.g., "Oh, electrons bounce differently than light").
  • This is a tiny, lightweight adjustment that allows the AI to learn a new particle type quickly without needing to relearn everything from scratch.

4. The Result: Fast, Flexible, and Future-Proof

By combining these two tricks (Specialized Experts for materials + Specialized Aprons for particle types), the team created a system that is:

  • Modular: You can add a new material or particle type by just plugging in a new "expert" or "apron."
  • Efficient: It doesn't need to retrain the whole brain. It only learns the small new parts.
  • Fast: They used tricks from the world of Large Language Models (like the ones powering chatbots) to make the AI run incredibly fast on graphics cards. It's now nearly as fast as older, simpler simulation methods but much more accurate.

The Big Picture

Think of this AI as a universal translator for particle physics.

  • Old Way: You hire a new translator for every single language (material/particle) you encounter. It's expensive and slow.
  • New Way: You have one brilliant translator who speaks the "universal language" of physics. When you need to translate a new dialect (a new material), you just hand them a small, specific dictionary (an Expert module). They instantly understand it without forgetting the previous languages.

This allows scientists to design better particle detectors faster, saving massive amounts of computing power and time, which is crucial for the next generation of experiments that will explore the deepest secrets of the universe.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →