Neural Scaling Laws for Boosted Jet Tagging

This paper investigates neural scaling laws for boosted jet classification using the JetClass dataset, demonstrating that increasing compute reliably drives performance toward asymptotic limits while revealing how data repetition, input features, and particle multiplicity influence scaling efficiency and effective dataset size.

Original authors: Matthias Vigl, Nicole Hartman, Michael Kagan, Lukas Heinrich

Published 2026-02-18
📖 5 min read🧠 Deep dive

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to teach a computer to distinguish between a "top quark jet" (a spray of particles from a heavy, rare particle) and a "QCD jet" (a common spray from ordinary particles). This is like trying to tell the difference between a rare, exotic fruit and a common apple just by looking at a pile of seeds.

For a long time, scientists in High Energy Physics (HEP) have been building better and better "fruit classifiers," but they haven't been using the same massive amounts of computing power that companies like OpenAI use to build giant language models (like the one you are talking to right now).

This paper asks a simple question: If we just keep adding more brainpower (computing power) and more data, will the computer get infinitely better at this task, or is there a ceiling?

Here is the breakdown of their findings using everyday analogies:

1. The "Recipe" for Success (Scaling Laws)

The authors discovered that improving these models follows a predictable recipe, similar to how baking a cake works.

  • The Ingredients: You need Model Size (how smart the brain is) and Data Size (how many examples it studies).
  • The Rule: If you double your computing power, you shouldn't just double the brain size or double the data. You need a specific balance. The paper found the "Golden Ratio" for mixing these ingredients to get the best results for the least amount of effort.
  • The Result: As you add more computing power, the error rate drops smoothly, like a ball rolling down a hill. But eventually, the ball hits the bottom of the valley.

2. The "Ceiling" (The Irreducible Limit)

Here is the most important finding: You can't get perfect.
Even if you give the computer infinite brainpower and infinite data, it will never reach 100% perfection. There is a "floor" to how well it can do.

  • The Analogy: Imagine trying to hear a whisper in a noisy room. No matter how good your ears are (model size) or how many times you listen (data), if the room is too noisy, you will never hear the whisper perfectly.
  • The Twist: The "noise" in this case is the input features. If you only give the computer basic info (like "how heavy is the fruit?"), the ceiling is low. But if you give it detailed info (like "what is the texture, color, and smell?"), the ceiling goes much higher. The paper shows that feeding the computer more detailed, "lower-level" data allows it to reach a much higher level of performance.

3. The "Re-reading" Problem (Data Repetition)

In physics, creating new data is incredibly expensive (it's like simulating a universe on a supercomputer). So, scientists often just make the computer read the same dataset over and over again (multiple "epochs").

  • The Analogy: It's like studying for a test by reading the same textbook chapter 10 times instead of reading 10 different chapters.
  • The Finding: Re-reading helps, but it's inefficient.
    • If you have a small textbook, reading it 10 times helps you memorize it well.
    • But eventually, you hit a point where reading it 11 times doesn't help at all; you just start memorizing the typos (overfitting).
    • The Cost: To get the same improvement by re-reading, you have to spend about 10 times more computing power than if you had just generated 10 times more new data.
    • The Lesson: It's usually better to generate new, unique data than to keep re-reading the old stuff, unless generating new data is impossible.

4. The "Overfitting" Threshold

The paper also figured out exactly how big the computer's brain needs to be before it starts "memorizing" instead of "learning."

  • The Analogy: If you have a tiny brain and a huge textbook, you will get confused (underfitting). If you have a giant brain and a tiny textbook, you will memorize every word but fail to understand the concept (overfitting).
  • The Discovery: There is a specific "sweet spot" where the brain size matches the data size. If you go beyond that, making the brain bigger doesn't help unless you also get more data.

5. Why This Matters for the Future

The authors used these rules to predict the future of particle physics.

  • They can now tell scientists: "If you want to improve your particle detector by 10%, you need to spend X amount of money on computing and Y amount on generating new data."
  • They also realized that simulation quality might be the real bottleneck. Even with a perfect computer, if the "simulated universe" data isn't perfect, the computer can't learn the truth.

Summary

This paper is a user manual for the future of AI in physics. It tells us:

  1. Keep scaling up: Bigger models and more data work, but you need to balance them correctly.
  2. There is a limit: You will eventually hit a performance wall, but you can push that wall higher by giving the AI better, more detailed data.
  3. Don't just re-read: It's better to get new data than to study the same data over and over, because re-reading gets expensive very quickly.

In short, they've turned the "black art" of training AI models into a predictable science, allowing physicists to plan their resources like a chef planning a massive banquet.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →