Joint Optimization of Model Partitioning and Resource Allocation for Anti-Jamming Collaborative Inference Systems

Imagine you have a very smart, but very tired, robot assistant (the Device) that needs to solve a complex puzzle, like identifying a car in a blurry photo. The robot is great at seeing the picture, but its brain is too small to finish the whole puzzle.

So, the robot calls for help from a super-computer in the cloud (the Edge Server). They decide to split the work: the robot does the first few steps, then sends a "progress report" (the Intermediate Data) over the air to the super-computer, which finishes the rest.

The Problem: The Mean Neighbor with a Megaphone
Now, imagine a mischievous neighbor (the Jammer) standing between the robot and the super-computer. This neighbor is shouting loudly (sending Jamming signals) to drown out the robot's progress report.

If the report gets garbled by the shouting, the super-computer gets confused and makes mistakes.
If the robot tries to shout louder to be heard, it drains its battery quickly.
If the robot tries to do more work itself to avoid sending a report, it gets exhausted and slow.

The goal of this paper is to figure out the perfect balance: How much work should the robot do? How loud should it shout? And how much help should the super-computer give? All while trying to finish the puzzle quickly, accurately, and without running out of battery, even while that mean neighbor is shouting.

The Solution: A Three-Part Dance

The authors propose a smart strategy called "Joint Optimization." Think of it as a dance where three partners adjust their moves simultaneously to avoid the shouting neighbor.

1. The "Split Point" (Where to cut the work)
The robot and the super-computer have to decide exactly where to split the puzzle.

Analogy: Imagine a relay race. Do you run the first 100 meters and hand off the baton? Or do you run 200 meters?
If you hand off too early, the baton (the data) is huge and hard to send through the shouting.
If you run too far, you get tired.
The paper uses a special mathematical "recipe" (Data Regression) to predict exactly how much the shouting will mess up the message based on where they split the work.

2. The "Volume Control" (Transmit Power)
The robot needs to decide how loud to shout.

Analogy: If the neighbor is shouting softly, you can whisper. If the neighbor is screaming, you have to yell. But if you yell too hard, you lose your voice (battery).
The system calculates the perfect volume to be heard clearly without wasting energy.

3. The "Helper's Speed" (Resource Allocation)
The super-computer needs to decide how fast to work on the second half of the puzzle.

Analogy: If the super-computer has many other people to help, it can't work too fast on just one robot's puzzle. It has to share its speed fairly.

How They Solve It: The "Quantum Genetic Algorithm"

Solving this puzzle is incredibly hard because there are millions of combinations of "where to split," "how loud to shout," and "how fast to work." It's like trying to find the perfect combination on a lock with a billion dials.

To solve this, the authors use a clever computer trick called an Alternating Optimization algorithm.

Step 1: They fix the volume and the split point, then figure out the best speed for the super-computer.
Step 2: They fix the speed and the split point, then figure out the best volume for the robot.
Step 3: They fix the volume and speed, then use a "Quantum Genetic Algorithm" to find the best split point.

What is a Quantum Genetic Algorithm?
Think of this as a super-smart evolution simulator.

Imagine a population of 100 different robots, each trying a different way to split the puzzle.
The "fittest" robots (those that finish fast and accurately) get to "reproduce" and mix their strategies.
The "Quantum" part is like giving these robots a superpower: they can try many different strategies at the same time (like being in two places at once) before picking the best one. This helps them find the perfect solution much faster than a normal computer could.

The Results: Why It Matters

The authors tested their idea in a simulation (a video game world) with 10 robots and a mean shouting neighbor.

The Winner: Their new strategy beat all the old ways of doing things.
The Trade-off: They created a score called RDA (Revenue of Delay and Accuracy). It's like a grade that combines "How fast did you finish?" and "How correct was the answer?"
The Result: Even when the neighbor was shouting very loudly, their system kept the robots working efficiently. The robots didn't run out of battery, and the super-computer didn't get confused.

In a Nutshell

This paper teaches us how to keep a team of robots and super-computers working together efficiently, even when someone is trying to jam their communication. By mathematically figuring out exactly where to split the work, how loud to talk, and how fast to compute, we can build smarter, more resilient systems for the future of 6G networks. It's about teamwork, smart planning, and not letting the noise win.

1. Problem Statement

The paper addresses the critical challenge of deploying Deep Neural Network (DNN) inference on resource-constrained wireless devices in the presence of malicious jamming.

Context: While Device-Edge Collaborative Inference (CI) improves efficiency by splitting DNN models between devices and Edge Servers (ES), the transmission of Intermediate Feature Data (IFD) is vulnerable to jamming.
The Conflict: Jamming degrades the Signal-to-Interference-plus-Noise Ratio (SINR), corrupting IFD and significantly reducing inference accuracy. Furthermore, the decision of where to partition the DNN model determines the size of the data to be transmitted, which in turn affects the system's susceptibility to jamming and transmission delay.
Objective: The goal is to maximize a Revenue of Delay and Accuracy (RDA) metric. This involves jointly optimizing:
1. Computation Resource Allocation: How much processing power the Edge Server assigns to each device.
2. Transmit Power: The power level at which devices send IFD to the ES.
3. DNN Partitioning: The specific layer in the DNN where the model is split between the device and the server.
Constraints: The optimization must satisfy minimum inference accuracy thresholds, device energy limits, and the total computing capacity of the Edge Server.

2. Methodology

The authors propose a comprehensive framework combining data-driven modeling with an iterative optimization algorithm.

A. System Modeling

DNN Partitioning: The DNN is split at a specific layer $k_n$ . Layers $0$ to $k_n$ run on the device; layers $k_n+1$ to $K$ run on the ES.
Inference Accuracy Model: Instead of assuming a theoretical relationship, the authors use data regression to model accuracy. They derived a closed-form logistic function relating inference accuracy to SINR and the partitioning point ( $k_n$ ). This captures the non-linear impact of jamming on different model depths.
Performance Metrics:
- Delay Revenue: Based on the total time (local computation + transmission + edge computation) relative to a maximum delay threshold.
- Accuracy Revenue: Based on the normalized difference between achieved accuracy and the minimum required threshold.

B. Optimization Formulation

The problem is formulated as a Mixed-Integer Nonlinear Programming (MINLP) problem (denoted as $P_0$ ). It is non-convex due to the coupling of discrete variables (partitioning points) and continuous variables (power, resources), as well as the non-linear accuracy function.

C. Proposed Solution: Alternating Optimization (AO)

To solve the complex MINLP, the authors propose an efficient iterative algorithm that decomposes the problem into three sub-problems, solved sequentially until convergence:

Computation Resource Allocation ( $P_1$ ):
- With power and partitioning fixed, this sub-problem is convex.
- Solution: Solved using Karush-Kuhn-Tucker (KKT) conditions to derive a closed-form optimal solution for resource allocation.
Transmit Power Optimization ( $P_2$ ):
- With resources and partitioning fixed, the problem is non-convex.
- Solution: The authors reformulate the problem by proving the objective increases monotonically with power. They transform the constraints (accuracy and energy) into convex forms, allowing the use of standard convex optimization solvers (e.g., CVX).
DNN Partitioning ( $P_3$ ):
- With resources and power fixed, this is an integer programming problem (discrete layer selection).
- Solution: A Quantum Genetic Algorithm (QGA) is employed. QGA integrates quantum computing principles (qubits, superposition) with genetic algorithms to avoid premature convergence and enhance global search capabilities for the optimal partitioning point.

3. Key Contributions

Novel Anti-Jamming Framework: This is the first work to jointly explore the impacts of DNN partitioning and transmission environments under malicious jamming within a device-edge CI system.
Data-Driven Accuracy Modeling: The paper introduces a regression-based accuracy model that links SINR and partitioning points via a logistic function, providing a more realistic assessment of jamming effects than theoretical bounds.
Joint Optimization Metric: The introduction of the RDA (Revenue of Delay and Accuracy) metric allows for a balanced trade-off between speed and reliability under adversarial conditions.
Efficient Algorithm: The development of a hybrid AO algorithm combining KKT conditions, convex optimization, and QGA to solve a highly complex MINLP problem.

4. Simulation Results

The authors evaluated the proposed scheme using a ResNet-18 model on the CIFAR-10 dataset with 10 devices and a malicious jammer.

Comparison Baselines: The proposed scheme was compared against:
- Local Computing (LC)
- Edge Server Computing (ESC)
- Fixed Transmit Power (FTP)
- GA-based partitioning (standard Genetic Algorithm)
Key Findings:
- RDA Performance: The proposed scheme consistently achieved the highest RDA across varying device computing capabilities and jamming power levels.
- Robustness: As jamming power increased, the proposed scheme maintained high inference accuracy (above the 80% threshold) and managed delay more effectively than baselines.
- Trade-off: While the LC scheme had low delay and the ESC scheme had high accuracy in ideal conditions, the proposed scheme offered the best balance under jamming, preventing the accuracy collapse seen in other schemes when the channel was degraded.

5. Significance

This research is significant for the evolution of 6G networks and Edge Intelligence.

Security: It addresses a critical vulnerability (jamming) often overlooked in collaborative inference literature, ensuring that AI services remain reliable in hostile environments.
Efficiency: By dynamically adjusting the model split based on channel conditions, it maximizes resource utilization without sacrificing accuracy.
Practicality: The use of data regression for accuracy modeling bridges the gap between theoretical communication models and real-world DNN performance, making the proposed solutions more applicable to real-world deployments.

In conclusion, the paper provides a robust, mathematically grounded, and practically validated solution for maintaining high-performance AI inference in wireless networks under active cyber-physical threats.

Joint Optimization of Model Partitioning and Resource Allocation for Anti-Jamming Collaborative Inference Systems

The Solution: A Three-Part Dance

How They Solve It: The "Quantum Genetic Algorithm"

The Results: Why It Matters

In a Nutshell

1. Problem Statement

2. Methodology

A. System Modeling

B. Optimization Formulation

C. Proposed Solution: Alternating Optimization (AO)

3. Key Contributions

4. Simulation Results

5. Significance

More like this

Managing Diabetic Retinopathy with Deep Learning: A Data Centric Overview

Truthful Production Uncertainty in Electricity Markets: A Two-Stage Mechanism

Cooperative Detour Planning for Dual-Task Drone Fleets

RIS-Assisted Joint Resource Allocation for 6G FR3 IoT Networks

A Self-Calibrating SDR for High Fidelity Beam- and Null-forming Arrays