Efficient Compositional Multi-tasking for On-device Large Language Models

This paper addresses the challenge of compositional multi-tasking in resource-constrained on-device Large Language Models by introducing a new benchmark for simultaneous multi-task execution and proposing an efficient "Learnable Calibration" method to enable high-performance task merging beyond single-task scenarios.

Ondrej Bohdal, Mete Ozay, Jijoong Moon, Kyeng-Hun Lee, Hyeonmok Ko, Umberto Michieli

Published 2026-03-13
📖 5 min read🧠 Deep dive

Imagine you have a very smart, but slightly small, robot assistant living inside your smartphone. This robot is great at one thing at a time: it can summarize a long article, or it can translate a text into Spanish, or it can write a funny reply to a friend.

But what happens when you need it to do two things at once? For example, you want to read a long news article, get a short summary of it, and have that summary translated into Spanish—all in one go.

This is the problem the paper solves. Here is the story of how they fixed it, explained simply.

The Problem: The "Do One Thing" Robot

Currently, if you want your phone to summarize and translate, you usually have to ask it to do them separately.

  1. "Hey robot, summarize this." (Robot does it).
  2. "Okay, now translate that summary." (Robot does it again).

This is like asking a chef to chop the vegetables, then stopping, washing the knife, and asking them to cook the soup. It works, but it's slow and wastes energy.

Alternatively, you could try to teach the robot a brand new "Super Skill" that combines both. But your phone has very limited storage space (like a tiny backpack). You can't carry a new, heavy backpack for every single combination of tasks (Summarize+Translate, Summarize+French, Reply+Professional Tone, etc.). There are too many combinations!

The Old Solution: The "Smoothie" Mistake

Researchers tried a method called Model Merging. Imagine you have two different "expert" robots:

  • Robot A is a master of Summarizing.
  • Robot B is a master of Translating.

The old idea was to take Robot A and Robot B, dump their brains into a blender, and mix them together to make a "Super Robot."

  • The Issue: When you blend them, the instructions get confused. The Super Robot might try to summarize and translate at the same time, but it ends up doing a bad job at both. It's like blending a hammer and a screwdriver; you get a weird tool that can't hammer well or screw well.

The New Solution: "Learnable Calibration"

The authors of this paper came up with a clever trick called Learnable Calibration.

Think of your phone's robot as a base car (like a standard Toyota Camry).

  • You already have specialized kits attached to it: a "Summarizing Kit" and a "Translating Kit." These are small, efficient add-ons (called Adapters or LoRAs) that you already own.

Instead of building a whole new car or smashing the kits together, the authors propose adding a tiny, customizable dashboard between the driver and the engine.

  1. The Setup: You take the existing Summarizing Kit and the Translating Kit and attach them to the car.
  2. The Calibration: You add a very small, smart "tuning knob" (the Learnable Calibration). This knob is tiny—so small it barely takes up any space in your backpack.
  3. The Magic: This knob learns how to tell the engine, "Hey, when I ask for a summary and a translation, don't just do them one after the other. Blend the instructions so the car drives smoothly in both directions at once."

Why is this a Big Deal?

  • Speed: It happens in one single step (one "inference pass"). The car drives straight to the destination without stopping.
  • Space: The "tuning knob" is incredibly small. You can have a different knob for every combination of tasks without filling up your phone's memory.
  • Performance: It works almost as well as the slow, inefficient method of doing tasks one by one, but it's much faster.

The "Benchmark" (The Driving Test)

To prove this works, the researchers built a Driving Test (a benchmark). They created four specific scenarios to test their new method:

  1. Summarize + Translate: "Read this long story and give me the short version in Spanish."
  2. Summarize + Tone Change: "Read this story and give me the short version, but make it sound very professional."
  3. Reply + Translate: "Write a reply to this text, but send it in French."
  4. Reply + Tone Change: "Write a reply, but make it sound funny/witty."

They tested their "Tuning Knob" method against the old "Blender" methods and the "Do it twice" methods.

The Result

The "Tuning Knob" (Learnable Calibration) won.

  • It was fast (one step).
  • It was light (tiny storage).
  • It was smart (it actually understood how to do both tasks together).

The Takeaway

This paper gives us a blueprint for making our phones smarter without making them slower or heavier. It allows our small, on-device AI assistants to juggle multiple complex tasks at once—like summarizing a document while translating it—by using a tiny, smart "tuning knob" to harmonize the skills they already have.

It's the difference between asking a friend to do two chores separately and teaching them how to do both chores simultaneously while humming a tune.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →