Grow, Don't Overwrite: Fine-tuning Without Forgetting

The paper introduces a novel function-preserving expansion method that eliminates catastrophic forgetting by mathematically replicating and scaling pre-trained parameters, enabling models to learn new tasks with full fine-tuning performance while retaining original capabilities and allowing for computationally efficient selective layer expansion.

Dyah Adila, Hanna Mazzawi, Benoit Dherin, Xavier Gonzalvo

Published 2026-03-10
📖 4 min read☕ Coffee break read

Imagine you have a brilliant, well-read librarian named Gemma. She has spent years reading millions of books, learning everything from how to bake a cake to how to solve complex physics problems. She is a master of general knowledge.

Now, you want to hire her for a very specific job: translating ancient French poetry.

The Problem: The "Overwrite" Trap

In the world of AI, when you try to teach a smart model like Gemma a new skill, a phenomenon called Catastrophic Forgetting often happens.

Think of it like this: To learn French poetry, the old librarian tries to cram new information into her brain. But her brain is full. So, to make room for the new French words, she accidentally throws out her old knowledge.

  • She learns to translate French perfectly.
  • But suddenly, she forgets how to bake a cake.
  • She forgets how to do basic math.
  • She forgets how to tell a joke.

This is the "Catastrophic Forgetting" problem. The more she learns the new job, the more she loses her original identity.

The Old Solutions (The Flawed Fixes)

Scientists have tried two main ways to fix this, but both have big downsides:

  1. The "Brake" Method: They tell the librarian, "Don't change your brain too much!" (This is called regularization). But this is like trying to learn French while wearing heavy handcuffs. She can't learn the new job very well because she's too afraid to forget anything.
  2. The "Add a New Brain" Method: They give her a brand new, empty brain just for French, while keeping her old brain frozen. But this new brain starts with zero knowledge. It's like hiring a fresh intern who knows nothing about the library. It takes forever to train them, and it's a waste of the librarian's existing wisdom.

The New Solution: "Grow, Don't Overwrite"

The authors of this paper came up with a clever trick called "Function-Preserving Expansion."

Instead of overwriting her old brain or giving her a blank one, they gently expand her brain to make room for the new skill without disturbing the old one.

Here is the analogy of how they do it:

1. The "Copy-Paste" Expansion

Imagine the librarian has a specific desk where she processes information.

  • Step 1: They take her existing desk setup and copy it exactly, placing a second identical desk right next to it. Now she has double the space to work.
  • Step 2: To make sure she doesn't get confused or change her output, they put a special filter on the second desk. This filter ensures that if she uses the new desk, the final result is mathematically identical to what she would have done with just the old desk.

The Magic: At the very moment they finish building this new setup, the librarian is exactly the same person she was before. She can still bake cakes, do math, and tell jokes perfectly. Nothing has changed yet.

2. The "Specialized Training"

Now, they start training her on French poetry.

  • Because she has extra space (the new desk), she can learn the new skill without throwing anything out.
  • They only train the new parts of her brain. The old parts (the original desk) remain frozen and untouched.
  • As she learns French, she uses the new space. The old space stays dedicated to her original knowledge.

The Results: The Best of Both Worlds

The paper shows that this method works incredibly well:

  • No Forgetting: The librarian learns French perfectly but never forgets how to bake a cake or do math.
  • Efficiency: You don't need to train the whole librarian. You only need to train the new "desk" you added. This saves a huge amount of computer power (about 60% less work!).
  • Modularity: If you only need her to learn a little bit of French, you only need to add a tiny bit of new space. If the task is super hard (like advanced math), you can add more space, and she gets better at it.

Why This Matters

This is a breakthrough because it solves the "Zero-Sum Game" of AI. Before, you had to choose between being a generalist (knowing everything) or a specialist (knowing one thing well).

This new method allows an AI to be both. It can grow into a specialist without ever losing its generalist soul. It's like giving a genius a new wing on their house to study a new subject, rather than forcing them to tear down their old library to make space.

In short: Instead of erasing the past to make room for the future, this method builds an addition to the house, so the family can grow without ever having to move out.