CoME: Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning

The paper proposes CoME, a novel mobile agent architecture that employs four specialized experts with a progressive training strategy and an InfoGain-Driven DPO method to achieve balanced, decoupled enhancement of hybrid reasoning capabilities, outperforming existing dense and MoE approaches on AITZ and AMEX datasets.

Yuxuan Liu, Weikai Xu, Kun Huang, Changyu Chen, Jiankun Zhao, Pengzhi Gao, Wei Liu, Jian Luan, Shuo Shang, Bo Du, Ji-Rong Wen, Rui Yan

Published 2026-03-09
📖 5 min read🧠 Deep dive

Imagine you are trying to teach a robot to use your smartphone to do something complex, like "Book a flight to Rome for next Saturday, but only non-stop flights."

To do this, the robot doesn't just guess. It has to think through a chain of steps:

  1. Look: What is on the screen right now? (Screen Summary)
  2. Plan: What are the smaller steps needed? (Subtask Plan)
  3. Decide: Which button should I press? (Action Decision)
  4. Act: Actually click the right spot on the screen. (Action Function)

The problem with current robot "brains" (AI models) is that they are like a generalist chef trying to do everything at once. They might be great at chopping vegetables (screen reading) but terrible at baking a cake (clicking the right button). Or, if you try to make them specialists, they get confused about when to switch hats.

This paper introduces CoME (Channel-of-Mobile-Experts), a new way to build these robot brains. Here is how it works, using some simple analogies:

1. The "Specialized Team" vs. The "Generalist"

Think of a standard AI model as a Swiss Army Knife. It has one blade that tries to do everything. It's okay at many things, but not great at any specific thing.

CoME is like a highly organized construction crew with four distinct specialists:

  • The Architect: Only looks at the blueprints (Screen Summary).
  • The Foreman: Only figures out the schedule (Subtask Plan).
  • The Decision Maker: Only chooses which tool to use (Action Decision).
  • The Worker: Only swings the hammer (Action Function).

In the past, these specialists were all mixed up in one big brain. CoME separates them into four distinct "channels" or experts.

2. The Magic Switch: "Output-Oriented Activation"

Here is the tricky part. In a normal team, you might ask everyone to listen to the input (the user's command) and then decide who speaks. But CoME does something smarter.

Imagine a conductor in an orchestra.

  • Old way (MoE): The conductor looks at the sheet music (input) and says, "Okay, the violinist is playing, so I'll let the violinist speak."
  • CoME way: The conductor looks at the moment in the song (the reasoning stage). If the song is at the "drum solo" part, the conductor only lets the drummer speak, even if the sheet music has notes for everyone.

CoME uses Output-Oriented Activation. It knows exactly which stage of thinking the robot is in. If the robot is currently "planning," CoME silences the other three experts and lets the "Planner" do all the talking. If it's time to "click," it switches to the "Worker." This prevents the robot from getting confused or trying to do two things at once.

3. The Training: "The Three-Step Boot Camp"

You can't just give a team of specialists a job and expect them to work together perfectly. The authors designed a progressive training strategy (a step-by-step boot camp):

  • Step 1: Expert-FT (Specialization): They train each specialist separately. The Architect only learns to read screens; the Worker only learns to click. They become masters of their own craft.
  • Step 2: Router-FT (The Conductor): They train the "Conductor" (the router) to know exactly when to switch from the Architect to the Worker. It learns the rhythm of the task.
  • Step 3: CoT-FT (Teamwork): Finally, they let the whole team work together on full tasks, learning how to pass the baton smoothly without dropping it.

4. The Safety Net: "InfoGain-Driven DPO" (The Truth Detector)

Even with a great team, mistakes happen. If the Architect misreads the screen, the Foreman plans the wrong schedule, and the whole thing fails. This is called error propagation.

To fix this, the authors added a Truth Detector called Info-DPO.
Imagine you are grading a student's essay.

  • Old way: You only look at the final grade. If the answer is right, you give an A, even if the student got there by guessing or using bad logic.
  • CoME way (Info-DPO): You look at every paragraph. You ask: "Did this paragraph actually help the student get closer to the answer?"
    • If a step adds new, useful information (like a lightbulb turning on), it gets a positive score.
    • If a step is confusing, repetitive, or leads to a dead end (like spinning in circles), it gets a negative score.

The system then punishes the robot for taking those "spinning in circles" steps and rewards it for taking the "lightbulb" steps. This forces the robot to learn how to think correctly, not just what to guess.

The Result?

When they tested CoME on real-world tasks (like booking flights or navigating apps), it beat the "Swiss Army Knife" models and the other "Specialist" models.

  • It made fewer mistakes.
  • It was better at clicking the exact right button.
  • It used less computer memory (it was more efficient).

In short: CoME is like taking a chaotic group of generalists, turning them into a specialized team, hiring a perfect conductor to manage them, and giving them a strict teacher who grades every single step of their thinking process. The result is a mobile robot that actually knows how to use your phone.