One-for-All Model Initialization with Frequency-Domain Knowledge

This paper introduces FRONT, a training-free framework that extracts a model's task-agnostic "learngene" from the low-frequency components of its weights using Discrete Cosine Transform, enabling efficient initialization of downstream models of arbitrary scales while significantly accelerating convergence and reducing training costs.

Jianlu Shen, Fu Feng, Yucheng Xie, Jiaqi Lv, Xin Geng

Published 2026-03-10
📖 5 min read🧠 Deep dive

The Big Problem: The "One-Size-Fits-None" Dilemma

Imagine you have a master chef who has spent 10 years perfecting a complex recipe for a giant, 50-course banquet (a massive, pre-trained AI model). This chef knows everything about cooking: how to chop, how to season, how to balance flavors.

Now, imagine you want to open a small food truck (a smaller AI model) or a massive catering hall (a larger AI model).

  • The Old Way: You try to copy the master chef's entire 50-course menu onto your small food truck. It doesn't fit! Or, you try to shrink the giant banquet down to a single sandwich, but you lose all the flavor.
  • The Current "Smart" Ways:
    • Cut-and-Paste: You try to grab just the "chopping" section of the chef's notes and hope that's enough. But cooking is interconnected; chopping without knowing the seasoning ruins the dish.
    • The "Magic Generator": You hire a robot to study the chef's notes and guess what the food truck's menu should look like. But this robot needs to study thousands of other chefs first, takes forever to learn, and often gets it wrong.

The Result: Starting a new AI model from scratch is slow and expensive. Trying to adapt a big model to a small one (or vice versa) is messy and usually fails.


The Solution: The "Learngene" (The DNA of Cooking)

The authors of this paper, FRONT, discovered something fascinating. They realized that the chef's true knowledge isn't in the specific details of the 50th course (like "how to garnish this specific strawberry"). That's just "noise" or high-frequency detail.

The real, fundamental knowledge—the "Learngene"—is in the low-frequency components.

  • Analogy: Think of a song. The high frequencies are the specific notes, the vibrato, and the unique instruments. The low frequencies are the melody and the rhythm. You can play a melody on a piano, a guitar, or a synthesizer, and it's still the same song.
  • The Discovery: The "essence" of what the AI has learned (how to recognize a cat, how to understand grammar) is encoded in these smooth, low-frequency patterns of the math inside the model.

How FRONT Works: The "Frequency Filter"

The paper proposes a new framework called FRONT (FRequency dOdomain kNowledge Transfer). Here is how it works, step-by-step:

1. The Magic Filter (DCT)

Imagine the AI's brain is a giant, complex painting.

  • FRONT uses a mathematical tool called the Discrete Cosine Transform (DCT). Think of this as a special filter that separates the painting into two piles:
    • Pile A (Low Frequency): The broad strokes, the main shapes, the core composition. (This is the "Learngene").
    • Pile B (High Frequency): The tiny specks of dust, the specific brush textures, the noise. (This is the task-specific detail).
  • FRONT throws away Pile B and keeps Pile A. This is the "Learngene."

2. The Shape-Shifter (Truncation & Padding)

Now you have a "Learngene" (a set of smooth, core patterns). You want to use it to start a new AI model that is a different size.

  • If the new model is smaller: You simply crop the edges of the Learngene. Since the core knowledge is in the center (low frequency), you don't lose the important stuff.
  • If the new model is bigger: You pad the Learngene with zeros (blank space) around the edges. The core patterns stay intact, and the new empty space is ready to be filled with new details later.
  • The Magic: This happens in milliseconds on a regular computer. No training required!

3. The "Refinement" (FRONT+)

Sometimes, the "Learngene" might still have a little bit of "noise" from the original task.

  • FRONT+ is like a quick polish. It takes the original model and runs a very short, cheap training session where it tells the model: "Hey, forget the specific details of this task. Focus only on the smooth, general patterns."
  • This creates an even cleaner "Learngene" that works even better.

Why This is a Game-Changer

  1. Speed: It's like getting a head start in a race. Instead of running from the starting line (training from scratch), you are already 90% of the way there.
    • Real-world impact: In vision tasks, models trained with FRONT learned 15 times faster. In language tasks, they saved 40% of the computing power.
  2. Flexibility: You can take a model trained to recognize dogs and instantly use its "Learngene" to start a model that recognizes cats, or a model that is twice as big, or half as big.
  3. No "Magic" Needed: Unlike other methods that require training a giant "generator" robot, FRONT just uses math to filter the existing model. It's simple, fast, and free.

The Bottom Line

The authors found that the "soul" of an AI model is hidden in its low-frequency math. By extracting this soul, they created a universal "starter kit" (the Learngene) that can be instantly resized to fit any new AI project.

In short: They figured out how to distill the "wisdom" of a giant AI into a tiny, portable seed that can grow into any size of tree, instantly. This saves massive amounts of time, money, and energy in the world of Artificial Intelligence.