AdaRank: Adaptive Rank Pruning for Enhanced Model Merging

AdaRank is a novel model merging framework that adaptively prunes detrimental singular components of task vectors during test-time via entropy minimization to mitigate cross-task interference and achieve state-of-the-art performance across various backbones and task configurations.

Chanhyuk Lee, Jiho Choi, Chanryeol Lee, Donggyun Kim, Seunghoon Hong

Published 2026-03-03
📖 5 min read🧠 Deep dive

Imagine you have a team of specialist chefs. One is a master of Italian pasta, another is a sushi expert, and a third is a pastry genius. Each has spent years perfecting their craft.

Now, imagine you want to open a restaurant that serves all three cuisines perfectly, but you only have one kitchen and one head chef to run it. You can't hire three separate chefs (that's too expensive and takes up too much space). So, you try to merge their knowledge into one person.

The Problem: The "Clash of Styles"

In the world of AI, this is called Model Merging. We take three different AI models (the chefs) and try to combine their "weights" (their knowledge) into a single model.

The old way of doing this was like asking the chefs to just average out their recipes.

  • Chef A says: "Put 10 cups of flour in the pasta."
  • Chef B says: "Put 2 cups of flour in the sushi."
  • The Average: "Put 6 cups of flour in everything."

Result: The pasta is dry, and the sushi is a floury mess. The models interfere with each other. The AI gets confused, and performance drops.

The Previous Fix: The "Top-10% Rule"

Researchers realized that instead of averaging everything, they should look at the "ingredients" of the knowledge. They used a mathematical tool called SVD (Singular Value Decomposition) to break the knowledge down into layers of importance.

They decided to keep only the top 10% most important ingredients (the "Top-K" rule) and throw the rest away, hoping this would reduce the noise.

The Flaw: This is like a rigid rulebook that says, "Always keep the top 10% of ingredients for every dish."

  • Issue 1: Sometimes, the "most important" ingredient for the pasta chef (like a specific spice) is actually terrible for the sushi chef. Keeping it causes a flavor clash.
  • Issue 2: Some dishes are simple (like plain rice) and only need a few ingredients. Others are complex (like a multi-layer cake) and need many ingredients. A fixed "Top 10%" rule treats a simple dish and a complex cake exactly the same, which is inefficient.

The Solution: AdaRank (The Smart Sous-Chef)

The authors of this paper propose AdaRank. Think of AdaRank as a smart, adaptive Sous-Chef who doesn't follow a rigid rulebook. Instead, they taste the food and adjust the recipe in real-time.

Here is how AdaRank works, using our kitchen analogy:

1. The "Binary Mask" (The Ingredient Switch)

Instead of just keeping the "Top 10%" of ingredients, AdaRank gives the chef a switch for every single ingredient.

  • Switch ON: Keep this ingredient.
  • Switch OFF: Throw this ingredient away.

Crucially, the chef can decide to turn OFF a "top" ingredient if it causes a clash, and turn ON a "bottom" (less obvious) ingredient if it helps a specific dish. It's not about rank; it's about what actually works.

2. "Test-Time Adaptation" (The Tasting Session)

How does the chef know which switches to flip? They don't have the recipe book (training data) anymore. Instead, they use Test-Time Adaptation.

Imagine the chef is about to serve the food to customers (the test data). They don't know the customers' names, but they can see if the customers look happy or unhappy.

  • The chef tries a combination of ingredients.
  • If the customers look confused (high "entropy" or uncertainty), the chef knows, "Oops, that ingredient is causing a clash."
  • The chef flips a switch, removes the bad ingredient, or adds a missing one.
  • They repeat this until the customers are smiling (low entropy).

This happens automatically and instantly before the model is even used for real tasks.

Why is this a Big Deal?

  1. It's Flexible: It realizes that the "Pasta" task needs a different set of ingredients than the "Sushi" task. It doesn't force a one-size-fits-all solution.
  2. It's Efficient: It doesn't need to keep three separate kitchens (three separate models). It fits everything into one kitchen, but the kitchen is now organized perfectly for every dish.
  3. It's Smarter: By looking at the actual result (the customer's reaction) rather than a pre-set rule (Top 10%), it avoids the "clashes" that ruin the meal.

The Bottom Line

AdaRank is like upgrading from a rigid, rule-following robot chef to a taste-sensitive, adaptive master chef. It looks at the specific needs of every task, prunes the ingredients that cause trouble, and keeps the ones that help, resulting in a single AI model that is almost as good as having three separate experts, but without the extra cost or space.

In the paper's experiments, this method made merged AI models perform significantly better, closing the gap between a "merged" model and a "perfectly trained" individual model, all while using the same amount of computer memory.