OmniLearned: A Foundation Model Framework for All Tasks Involving Jet Physics

This paper introduces OmniLearned, a major upgrade to the OmniLearn foundation model framework for jet physics that leverages a billion-jet training dataset and improved architecture to achieve state-of-the-art performance in top-quark tagging, b-tagging, and anomaly detection, thereby significantly enhancing the discovery potential of collider experiments.

Original authors: Wahid Bhimji, Chris Harris, Vinicius Mikuni, Benjamin Nachman

Published 2026-03-27
📖 6 min read🧠 Deep dive

Original authors: Wahid Bhimji, Chris Harris, Vinicius Mikuni, Benjamin Nachman

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to teach a computer to recognize different types of clouds in the sky. You could show it a picture of a cumulus cloud and say, "This is a cumulus," then show it a stratus and say, "This is a stratus." But what if you wanted the computer to recognize every kind of cloud, predict how they move, and spot a strange, never-before-seen cloud formation that might signal a tornado?

To do that, you wouldn't just show it a few pictures. You'd need to show it billions of cloud photos, let it study the physics of how water vapor behaves, and let it learn the "essence" of what a cloud is before you ever ask it to identify a specific type.

This is exactly what the paper "Foundation Model Framework for All Tasks Involving Jet Physics" is about, but instead of clouds, it's about particle physics jets.

What is a "Jet"?

In particle colliders (like the Large Hadron Collider), scientists smash particles together at nearly the speed of light. When a heavy particle (like a top quark) breaks apart, it doesn't just vanish; it sprays out a shower of smaller particles. To our detectors, this spray looks like a single, messy blob of energy. Physicists call this a "jet."

For decades, scientists have tried to build computer programs to look at these messy blobs and say, "Ah, that's a top quark!" or "That's just a boring background particle." The problem is, there are so many different jobs to do with jets, and training a new computer program for every single job takes forever and requires massive amounts of data.

The Old Way vs. The New Way

The Old Way (The Specialist):
Imagine hiring a different chef for every meal. You hire a sushi chef for dinner, a pizza chef for lunch, and a baker for breakfast. Each chef is great at their specific task, but they have to start from scratch every time. If you want a new dish, you have to hire a new chef and train them from zero. This is how particle physics used to work: a new AI model for every new experiment.

The New Way (The Foundation Model):
This paper introduces OmniLearned, which is like hiring a Master Chef who has tasted every dish in the world.

  1. The Training: They fed this Master Chef a massive library of over 1 billion different "jet" recipes (simulated data). The chef didn't just memorize the recipes; they learned the fundamental rules of cooking (the physics of how particles interact).
  2. The Result: Now, this chef is a "Foundation Model." They have a deep, universal understanding of jets.
  3. The Magic: When you need a specific dish (like identifying a top quark), you don't hire a new chef. You just give the Master Chef a quick, 5-minute briefing on what you want. Because they already know the basics, they can adapt instantly and perfectly.

What Makes OmniLearned Special?

The authors upgraded their previous model (called OmniLearn) with three major improvements:

  1. A Bigger Brain (Architecture): They tweaked the computer's "brain" (the neural network) to pay better attention to the tiny details inside the jet, like how particles are arranged relative to each other. It's like upgrading from a magnifying glass to a high-powered microscope.
  2. More Data (1 Billion Jets): The previous model was trained on 100 million examples. The new one saw 10 times more. In the world of AI, more data usually means a smarter, more reliable model.
  3. Open Source (The Cookbook): They didn't just keep the secret recipe. They released the software, the data, and the instructions to the whole world. Anyone can now download this "Master Chef" and use it for their own experiments.

What Did They Prove?

To show off their new Master Chef, they tested it on three very different "kitchen tasks":

  • Task 1: The Top Quark Hunt (The "Top Chef" Challenge)
    They asked the model to find top quarks in a standard test dataset. The OmniLearned model didn't just pass; it crushed the competition, beating every other existing AI model by a wide margin. It found the signal with much less "noise" (background junk).

  • Task 2: The Flavor Tagging (The "Ingredient" Challenge)
    In real experiments (using data from the ATLAS detector), they asked the model to distinguish between jets made of "bottom" quarks, "charm" quarks, and regular light quarks. This is crucial for finding new physics. OmniLearned was significantly better at this than the current best methods used by the ATLAS collaboration, even though it had to learn a new "language" (the specific detector data) very quickly.

  • Task 3: The Anomaly Detector (The "Alien Cloud" Challenge)
    This is the coolest part. They asked the model to find something it had never seen before. They used real data from the CMS experiment and asked the AI to find the "needle in the haystack."

    • The Trick: They didn't tell the AI what to look for. They just asked it to find anything that looked weird compared to the "normal" background.
    • The Result: The AI successfully found the top quarks hidden in the data, even though it wasn't explicitly told to look for them. It essentially said, "Hey, these blobs look different from the rest; let's investigate." This proves the model can be used for discovery, not just classification.

Why Does This Matter?

Think of the universe as a giant, complex puzzle. For years, physicists have been trying to solve it with small, specialized tools. OmniLearned is like handing them a universal key.

  • Speed: It takes less time and computing power to solve new problems.
  • Discovery: It can spot strange new particles that we didn't even know to look for.
  • Collaboration: By making the code and data public, they are letting scientists all over the world use this powerful tool to accelerate their own discoveries.

In short, this paper says: "We built a super-smart AI that has studied the entire history of particle collisions. Now, instead of training a new AI for every new question, we can just ask this one master AI, and it will help us find the answers faster and better than ever before."

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →