MEC Task Offloading in AIoT: A User-Centric DRL Model Splitting Inference Scheme

Imagine you are the manager of a bustling city called AIoT City. In this city, thousands of tiny robots (your User Devices, like smartphones or smart cameras) are constantly trying to solve complex puzzles (running AI tasks).

Some puzzles are easy and the robots can solve them themselves using their own tiny brains (Local Computing). But many puzzles are too big, too slow, or too energy-draining for a single robot. They need help from the city's super-computers located at the edge of town, called Edge Servers (MEC).

The problem? The city is chaotic.

Traffic Jams: Sending data to the server takes time (Latency).
Battery Anxiety: Sending data drains the robot's battery (Energy).
Server Overload: The super-computers have limited desk space (Storage) and limited workers (CPU cores). If too many robots rush the same server, everyone waits in line, and some tasks get dropped.
The "Mixed" Problem: The robots need to make two types of decisions at once: Discrete choices (Which server? Yes/No?) and Continuous choices (How much power to use? How fast to run?).

Existing methods are like traffic cops who only look at one thing at a time, or who try to solve the whole city's traffic with a single, rigid rulebook. They often fail when the city gets too busy.

The Paper's Solution: A "Two-Stage" Smart Assistant

The authors propose a new system called UCMS (User-Centric Model Splitting). Think of this not as a single brain, but as a two-person team working together to make decisions: The Robot (User) and The Server.

1. The "Co-Selection" Dance (Finding the Right Partner)

Before the robots even start solving puzzles, they need to pick the right server.

Old Way: Robots just pick the closest server. This causes a stampede; one server gets crushed while others sit empty.
New Way (Co-Selection): It's like a speed-dating event.
- The Robots say: "I want a server that is fast and has free space."
- The Servers say: "I want robots with small, quick puzzles so I can finish them fast."
- They match up based on mutual benefit, ensuring no one is overwhelmed.

2. The "Split Brain" Decision (The Core Innovation)

This is the cleverest part. Instead of one giant AI trying to decide everything, they split the decision-making process into two stages, like a drafting process:

Stage 1: The Robot's Draft (User-Side)
The robot looks at its own situation (battery, task size) and makes a preliminary guess: "I think I should send this task to Server A, and I'll use 50% of my power." It's a rough draft.
Stage 2: The Server's Final Edit (Server-Side)
The robot sends this draft to the server. The server looks at the whole city (global view). It sees, "Oh, Server A is actually full right now, but Server B has space."
The server then approves or rejects the robot's request. If approved, the task goes to Server B. If rejected, the robot keeps the task.

Why is this cool? It combines the robot's local knowledge with the server's global knowledge. It's like a student writing an essay (User) and a teacher editing it (Server) to make sure it fits the class rules.

3. The "Smart Coach" (The AI Algorithm)

To teach these robots and servers how to make good decisions, the authors use a special AI coach called UCMS_MADDPG.

The Reward System: The coach gives points for finishing tasks fast and saving battery. It gives penalties if a robot runs out of battery or if a task takes too long.
The "Error-Sampling" Trick: Usually, AI learns by repeating mistakes until it gets them right. But this can get boring (the AI gets stuck on the same mistake).
- This new coach uses a Reward-Error Trade-off. It says, "Let's learn from the mistakes that hurt us the most, but also from the times we got lucky." It balances learning from big errors with learning from good results, helping the AI escape "local traps" and find the best strategy faster.

The Results: A Smoother City

When the authors tested this system in a simulation:

Faster Learning: The new system figured out the best strategies much quicker than the old ones.
Less Traffic: Fewer tasks were dropped because the servers weren't overloaded.
Better Battery Life: The robots conserved more energy.
Scalability: Even when they added more robots and servers (making the city bigger), the system kept working well.

In a Nutshell

This paper introduces a collaborative, two-step decision-making system for smart devices. Instead of forcing a device to guess what the server can handle, the device makes a "best guess," and the server gives the final "yes or no" based on the big picture. By using a smart AI coach that learns from both mistakes and successes, the whole system becomes faster, more efficient, and less likely to crash under pressure.

It's the difference between a chaotic crowd rushing a single door and a well-organized line where everyone knows exactly where to go.

Here is a detailed technical summary of the paper "MEC Task Offloading in AIoT: A User-Centric DRL Model Splitting Inference Scheme".

1. Problem Statement

The paper addresses the challenges of task offloading in Mobile Edge Computing (MEC) within Artificial Intelligence of Things (AIoT) environments. Key issues include:

Complex Constraints: AIoT systems face multi-angle resource constraints, including limited user device battery/computing power, server storage capacity, and dynamic wireless communication resources.
Hybrid Action Spaces: Existing Deep Reinforcement Learning (DRL) methods struggle to jointly optimize discrete decisions (e.g., selecting a server, binary offloading choice) and continuous decisions (e.g., power allocation, CPU frequency) simultaneously.
Server Storage Neglect: Most existing DRL-based offloading algorithms assume ideal server storage, ignoring the reality of finite storage capacity which leads to task rejection or system instability in high-load scenarios.
Dynamic Environments: Traditional algorithms often fail to adapt to dynamic user-server associations and time-varying task patterns in overlapping service areas.

The core objective is to minimize the weighted sum of task delay and energy consumption while satisfying constraints on latency, battery life, server storage, and user capacity.

2. Methodology

The authors propose a User-Centric Model Splitting (UCMS) inference scheme combined with a UCMS_MADDPG (Multi-Agent Deep Deterministic Policy Gradient) algorithm.

A. System Model & Optimization Formulation

Scenario: $N$ user devices (UDs) and $M$ edge servers (ESs) with overlapping coverage.
Task Model: Tasks are binary (Local Computing vs. Full Offloading).
Constraints: Includes transmission power, local computing frequency, battery thresholds, server user capacity, and server storage capacity.
Decoupling: The NP-hard Mixed-Integer Programming (MIP) problem is decoupled into two sub-problems:
1. User-Server Selection: Matching users to servers.
2. Task Offloading & Resource Allocation: Determining offloading decisions and resource parameters.

B. User-Server Co-Selection Algorithm

Before DRL training, a heuristic co-selection algorithm is used to establish an initial matching:

Mechanism: Both users and servers evaluate "selection functions" based on their respective interests (users minimize delay/energy; servers minimize processing load).
Process: Users apply to servers with the best metrics; servers accept users based on capacity limits and their own selection criteria. This provides a high-quality initialization for the DRL agent.

C. UCMS_MADDPG Algorithm (The Core Innovation)

The paper introduces a Model Splitting Inference architecture where the decision-making process is split between the User and the Server:

State Space: Includes task parameters, battery level, channel gain, and resource status.
Action Space Splitting (Hybrid Decision):
- Stage 1 (User-side): The User Agent (Actor) generates a pre-decision (continuous values for offloading probability, power, and frequency). If the offloading probability is low, the task is processed locally.
- Stage 2 (Server-side): If offloading is requested, the Server Agent evaluates the request against global resource constraints (storage, CPU availability). It makes a final binary decision (Accept/Reject) based on the user's pre-decision and global state.
Reward Function: Designed to maximize long-term returns, incorporating a penalty for battery depletion and task timeouts.
Priority Sampling Mechanism: A novel Reward-Error Trade-off mechanism is introduced for experience replay. Instead of relying solely on TD-error (which can lead to overfitting), the priority is a weighted combination of the current reward and the TD-error. This balances short-term feedback with long-term learning stability.

3. Key Contributions

Comprehensive Resource Modeling: The model explicitly incorporates server storage constraints alongside communication and computation limits, addressing a gap in existing literature where storage is often assumed infinite.
User-Centric Model Splitting: A novel inference scheme where the user makes a preliminary decision and the server refines it based on global constraints. This effectively handles the hybrid action space (continuous resource allocation + discrete offloading/acceptance) without requiring complex mixed-action DRL architectures.
Co-Selection Algorithm: A heuristic algorithm that optimizes the initial user-server matching, reducing the search space for the DRL agent and improving convergence speed.
Reward-Error Trade-off Sampling: A new experience replay strategy that prevents agents from getting stuck in local optima by balancing the importance of high-reward samples and high-error (learning potential) samples.

4. Simulation Results

The authors conducted simulations using PyTorch with 48 users and 3 edge servers, comparing UCMS_MADDPG against benchmarks (Standard MADDPG, Random Selection, and Heuristic Priority methods).

Convergence: UCMS_MADDPG converged significantly faster (stabilizing around 60 rounds) compared to RD_UCMS_MADDPG (Random) and standard MADDPG.
System Cost: UCMS_MADDPG achieved the lowest total system cost (delay + energy) across varying numbers of users (12 to 57).
Task Timeout: The proposed scheme maintained the lowest task timeout rate, demonstrating superior ability to handle deadline constraints under resource competition.
Scalability: Experiments with increased server counts (3 to 5 servers) showed that while convergence takes slightly longer in larger systems, the algorithm remains stable and effective.
Robustness: Even under high energy-consumption penalties and extended time slots, the algorithm outperformed heuristics and standard DRL, proving its adaptability to dynamic environments.

5. Significance

This paper makes a significant contribution to the field of AIoT and Edge Computing by:

Bridging the Gap between Theory and Reality: By explicitly modeling server storage limits and dynamic user-server overlaps, the proposed solution is more applicable to real-world deployments than previous theoretical models.
Solving the Hybrid Action Problem: The "Model Splitting" approach offers a practical and efficient way to handle mixed discrete/continuous decision spaces in multi-agent systems, avoiding the complexity of designing specialized mixed-action neural networks.
Enhancing Learning Efficiency: The reward-error trade-off sampling mechanism provides a new direction for improving the stability and generalization of DRL in resource-constrained environments.

In summary, the proposed UCMS_MADDPG scheme offers a robust, scalable, and efficient solution for task offloading in complex, dynamic AIoT environments, effectively balancing latency, energy, and resource constraints.