SlimEdge: Performance and Device Aware Distributed DNN Deployment on Resource-Constrained Edge Hardware

Imagine you are the manager of a massive construction project. Your goal is to build a perfect 3D model of a city using photos taken from 12 different angles (like 12 different cameras). In a perfect world, you would have 12 identical, super-powerful robots, each taking a photo, analyzing it, and sending the data to a central command center to build the final picture.

But here's the catch: You don't have 12 identical robots.

Some of your robots are old, rusty, and have tiny memory cards (low-end edge devices). Others are fast but have very little battery life. Some might even break down in the middle of the job. If you give every robot the exact same heavy, complex instructions, the old ones will crash, the slow ones will hold up the whole team, and if one breaks, the whole project might fail.

This is the problem SlimEdge solves.

The Problem: The "One-Size-Fits-All" Mistake

Most AI systems today try to treat every device the same. They say, "Here is the same heavy backpack for everyone to carry."

The Fast Robot: Carries the heavy backpack easily but wastes time.
The Slow Robot: Struggles, drops the backpack, and slows down the whole team.
The Broken Robot: If one breaks, the team stops because they were all relying on that one specific piece of information.

The Solution: SlimEdge (The Smart Foreman)

SlimEdge is like a brilliant, adaptive foreman who looks at the team and says, "Let's not give everyone the same backpack. Let's give each robot a backpack that fits their strength and memory."

Here is how it works, broken down into simple steps:

1. Knowing What Matters (The "View Importance")

Not all camera angles are equally important.

Analogy: Imagine trying to identify a car. A photo of the front of the car (headlights, grill) is super important. A photo of the top of the car might be less critical for telling if it's a truck or a sedan.
SlimEdge's Move: It analyzes the data and realizes, "Hey, the front view is 10% more important than the top view." So, it decides to protect the important views and be more aggressive about cutting corners on the less important ones.

2. Knowing Who Can Carry What (Device Awareness)

SlimEdge checks the "muscle" of each robot.

Analogy: If Robot A is a strong, fast truck, it can carry a slightly heavier load. If Robot B is a tiny, weak scooter, it needs a very light load.
SlimEdge's Move: It calculates exactly how much data each device can handle. It doesn't just guess; it does the math to ensure no device gets overwhelmed.

3. The "Smart Cut" (Adaptive Pruning)

This is the magic trick. "Pruning" in AI means cutting out unnecessary parts of the brain (neurons) to make it smaller and faster.

Uniform Pruning (The Old Way): "Cut 20% of the brain from every robot." This might ruin the important views on the smart robots and still be too heavy for the weak ones.
SlimEdge (The New Way):
- For the Important Views: "Don't cut much here! We need this data."
- For the Weak Devices: "Cut a lot here! We need to save space."
- For the Slow Devices: "Cut a lot here! We need to speed you up so you don't hold up the team."

4. The "Break-Proof" Plan (Failure Resilience)

What if a robot breaks in the middle of the job?

The Old Way: The team panics. The project fails because the missing data was critical.
SlimEdge's Move: It has a dynamic plan. If Robot #4 breaks, SlimEdge instantly re-calculates. It says, "Okay, Robot #4 is gone. We will take a tiny bit of extra work from Robot #1 and Robot #2, and we will trim the data even more aggressively on the remaining robots to make up for the missing piece." The system keeps running without stopping.

The Results: Why It's a Big Deal

The researchers tested this on a simulated system with 1,000 different scenarios (different mix of fast/slow devices, different failure rates).

Speed: They made the system 4.7 times faster than the old methods.
Reliability: Even when half the devices failed, the system still worked and gave accurate results.
Efficiency: Every device got a custom-tailored version of the AI that fit perfectly in its memory, like a custom-made suit instead of a generic one.

The Bottom Line

SlimEdge is a smart way to run complex AI on a bunch of different, imperfect, and sometimes broken devices. Instead of forcing a square peg into a round hole, it reshapes the peg for every hole it encounters. It ensures that even if your network is a mix of old phones, new servers, and devices that might crash, the AI still works fast, fits in memory, and gets the job done.

It turns a chaotic, fragile network of devices into a resilient, high-speed team.

1. Problem Statement

The deployment of Deep Neural Networks (DNNs), specifically for complex tasks like 3D object recognition, on resource-constrained edge devices faces three critical challenges:

Resource Constraints: Edge devices often lack the memory and computational power required for state-of-the-art models (e.g., Multi-View Convolutional Neural Networks or MVCNNs).
Heterogeneity: Distributed edge systems consist of devices with varying memory budgets and compute capabilities. Uniform compression strategies fail to account for these differences, leading to either memory overflows on weak devices or suboptimal performance on strong ones.
Informational Asymmetry & Failure: In multi-view inference (where multiple cameras capture different angles of an object), not all views contribute equally to classification accuracy. Furthermore, existing frameworks lack robustness against partial device failures; if a node fails, the entire system performance often degrades significantly or fails.

Current solutions typically treat model compression and distributed inference as separate problems or apply uniform pruning, ignoring the specific importance of individual views and the dynamic availability of hardware.

2. Methodology: The SlimEdge Framework

SlimEdge is a unified framework that integrates structured model pruning with multi-objective optimization to generate device-specific models that respect accuracy, memory, and latency constraints, even under device failure.

Core Architecture

Base Model: The framework utilizes an MVCNN architecture (based on VGG11) where 12 distinct 2D views of a 3D object are processed by independent feature extractors on separate edge nodes. Features are aggregated via max-pooling at a central server for classification.
Centralized Orchestration: A central server profiles devices, calculates view importance, and runs the optimization loop to generate tailored pruning configurations for each edge node.

Key Technical Components

View Importance Quantification:
- Instead of treating all views equally, SlimEdge quantifies the contribution of each view to the final classification accuracy.
- A supervised regression model (LightGBM) is trained on a dataset of 93,000 pruning configurations to map pruning ratios to accuracy.
- Feature importance scores are extracted and normalized ( $I_v$ ) to determine which views are critical (e.g., View 2 has 10.5% importance, while View 1 has 7.2%).
Device-Aware Pruning Allocation:
- Minimum Pruning: The Newton-Raphson method is used to calculate the minimum pruning ratio ( $p^{min}_v$ ) required for each device to fit its specific memory budget ( $M_v$ ).
- Weighted Allocation: Additional pruning is distributed inversely proportional to view importance and directly proportional to device slowness. A weighting term $W_v = (1 - I_v)(1 + D_v)$ $W_{v} = (1 - I_{v}) (1 + D_{v})$ is used, where $D_v$ $D_{v}$ is the device performance factor. This ensures that:
  - High-importance views are pruned less.
  - Slow devices are pruned more aggressively to reduce system latency bottlenecks.
Multi-Objective Optimization (NSGA-II):
- The framework employs a biased-initialization Non-Dominated Sorting Genetic Algorithm II (NSGA-II) to find the optimal pruning vector.
- Objectives: Minimize latency (bottleneck subnetwork), minimize accuracy deviation, and maximize a composite reward (balancing size and speed).
- Constraints: Hard constraints on minimum accuracy ( $A_{min}$ ) and device memory ( $M_v$ ).
- Initialization: The population is seeded with a "Minimum Pruning Vector" and an "Importance-Aware Vector" using Beta-distribution sampling to accelerate convergence in high-dimensional spaces.
Failure Resilience:
- The optimization loop dynamically handles device status. If a device is marked offline (failed), its pruning ratio is fixed to 1.0 (effectively removing it), and the pruning budget is redistributed among the remaining active nodes to maintain global accuracy.

3. Key Contributions

Joint Objective Formulation: Proposes a novel objective function that couples view importance (via Taylor expansion and LightGBM) with real-time hardware latency, moving beyond purely accuracy-centric or hardware-centric metrics.
Biased-Initialization NSGA-II: Introduces a sampling strategy using Beta-distribution to seed the genetic algorithm with high-quality, importance-aware candidates, significantly speeding up convergence compared to standard random initialization.
Dynamic Failure Resilience: Implements a logic that automatically re-allocates pruning budgets upon device failure, ensuring the system remains functional without retraining the base model.
Systematic Simulation: Validates the framework across 1,000 simulated heterogeneous configurations, demonstrating robustness under severe failure scenarios (up to 50% device loss).

4. Experimental Results

The framework was evaluated using the ModelNet40 dataset (12,311 CAD models, 40 classes) with a VGG11 backbone.

Performance under Optimal Conditions:
- Achieved a target accuracy of 86.33% with a 2.86× inference speedup while fitting all models within device memory limits.
Performance under 33% Device Failure:
- With 4 of 12 devices offline, SlimEdge maintained a target accuracy of 82.65% and achieved a 4.26× speedup by redistributing the workload to the remaining 8 active views.
Performance under 50% Device Failure:
- Even with 6 devices offline, the system met a minimum accuracy of 75.22% with a 4.70× speedup.
Ablation Study:
- Compared against baselines, the full SlimEdge framework (incorporating view importance and device performance) achieved 80.00% accuracy and a 5.47× speedup, significantly outperforming uniform pruning (32.15% accuracy) and hardware-aware-only pruning (59.96% accuracy).

5. Significance and Conclusion

SlimEdge represents a paradigm shift from static, "one-size-fits-all" model compression to dynamic, system-level optimization. Its significance lies in:

Scalability: It enables the deployment of complex vision models on heterogeneous, low-power edge clusters that would otherwise be infeasible.
Robustness: It provides a viable pathway for mission-critical applications (e.g., traffic monitoring, surveillance) where device failures are common, ensuring the system degrades gracefully rather than failing completely.
Efficiency: By explicitly modeling the informational asymmetry of multi-view data and the heterogeneity of hardware, it eliminates the inefficiencies of uniform compression, reducing inference latency by up to 4.7× while strictly adhering to memory and accuracy constraints.

The paper concludes that treating compression as a dynamic, failure-resilient optimization problem is essential for the future of distributed edge intelligence.