MEC Task Offloading in AIoT: A User-Centric DRL Model Splitting Inference Scheme

This paper proposes a user-centric deep reinforcement learning framework, specifically a UCMS_MADDPG-based offloading algorithm, to optimize model splitting inference in AIoT-enabled mobile edge computing by jointly addressing resource allocation, server selection, and task offloading to minimize execution delay and energy consumption under dynamic constraints.

Weixi Li, Rongzuo Guo, Yuning Wang, Fangying Chen

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you are the manager of a bustling city called AIoT City. In this city, thousands of tiny robots (your User Devices, like smartphones or smart cameras) are constantly trying to solve complex puzzles (running AI tasks).

Some puzzles are easy and the robots can solve them themselves using their own tiny brains (Local Computing). But many puzzles are too big, too slow, or too energy-draining for a single robot. They need help from the city's super-computers located at the edge of town, called Edge Servers (MEC).

The problem? The city is chaotic.

  1. Traffic Jams: Sending data to the server takes time (Latency).
  2. Battery Anxiety: Sending data drains the robot's battery (Energy).
  3. Server Overload: The super-computers have limited desk space (Storage) and limited workers (CPU cores). If too many robots rush the same server, everyone waits in line, and some tasks get dropped.
  4. The "Mixed" Problem: The robots need to make two types of decisions at once: Discrete choices (Which server? Yes/No?) and Continuous choices (How much power to use? How fast to run?).

Existing methods are like traffic cops who only look at one thing at a time, or who try to solve the whole city's traffic with a single, rigid rulebook. They often fail when the city gets too busy.

The Paper's Solution: A "Two-Stage" Smart Assistant

The authors propose a new system called UCMS (User-Centric Model Splitting). Think of this not as a single brain, but as a two-person team working together to make decisions: The Robot (User) and The Server.

1. The "Co-Selection" Dance (Finding the Right Partner)

Before the robots even start solving puzzles, they need to pick the right server.

  • Old Way: Robots just pick the closest server. This causes a stampede; one server gets crushed while others sit empty.
  • New Way (Co-Selection): It's like a speed-dating event.
    • The Robots say: "I want a server that is fast and has free space."
    • The Servers say: "I want robots with small, quick puzzles so I can finish them fast."
    • They match up based on mutual benefit, ensuring no one is overwhelmed.

2. The "Split Brain" Decision (The Core Innovation)

This is the cleverest part. Instead of one giant AI trying to decide everything, they split the decision-making process into two stages, like a drafting process:

  • Stage 1: The Robot's Draft (User-Side)
    The robot looks at its own situation (battery, task size) and makes a preliminary guess: "I think I should send this task to Server A, and I'll use 50% of my power." It's a rough draft.
  • Stage 2: The Server's Final Edit (Server-Side)
    The robot sends this draft to the server. The server looks at the whole city (global view). It sees, "Oh, Server A is actually full right now, but Server B has space."
    The server then approves or rejects the robot's request. If approved, the task goes to Server B. If rejected, the robot keeps the task.

Why is this cool? It combines the robot's local knowledge with the server's global knowledge. It's like a student writing an essay (User) and a teacher editing it (Server) to make sure it fits the class rules.

3. The "Smart Coach" (The AI Algorithm)

To teach these robots and servers how to make good decisions, the authors use a special AI coach called UCMS_MADDPG.

  • The Reward System: The coach gives points for finishing tasks fast and saving battery. It gives penalties if a robot runs out of battery or if a task takes too long.
  • The "Error-Sampling" Trick: Usually, AI learns by repeating mistakes until it gets them right. But this can get boring (the AI gets stuck on the same mistake).
    • This new coach uses a Reward-Error Trade-off. It says, "Let's learn from the mistakes that hurt us the most, but also from the times we got lucky." It balances learning from big errors with learning from good results, helping the AI escape "local traps" and find the best strategy faster.

The Results: A Smoother City

When the authors tested this system in a simulation:

  • Faster Learning: The new system figured out the best strategies much quicker than the old ones.
  • Less Traffic: Fewer tasks were dropped because the servers weren't overloaded.
  • Better Battery Life: The robots conserved more energy.
  • Scalability: Even when they added more robots and servers (making the city bigger), the system kept working well.

In a Nutshell

This paper introduces a collaborative, two-step decision-making system for smart devices. Instead of forcing a device to guess what the server can handle, the device makes a "best guess," and the server gives the final "yes or no" based on the big picture. By using a smart AI coach that learns from both mistakes and successes, the whole system becomes faster, more efficient, and less likely to crash under pressure.

It's the difference between a chaotic crowd rushing a single door and a well-organized line where everyone knows exactly where to go.