FAuNO: Semi-Asynchronous Federated Reinforcement Learning Framework for Task Offloading in Edge Systems

Imagine a bustling city where everyone has a smartphone that needs to do heavy calculations—like editing a 4K video or running a complex AI model. If everyone sent these tasks to a single, giant "Cloud" server far away, the internet would clog up, and everything would be slow.

Edge Computing is the solution: instead of one giant server, we have many smaller, local servers (like routers, smart hubs, or even other phones) scattered around the city. But here's the problem: Who does what? If a phone sends a task to a local server that is already overloaded, the task fails. If it sends it to a server that is too far away, it takes too long.

This is where FAuNO comes in. Think of FAuNO as a super-smart, decentralized traffic control system for these local servers.

The Core Problem: The "Straggler" and the "Selfish" Neighbor

In a perfect world, all the local servers would talk to each other constantly, share their status, and make a perfect plan together. But in reality:

Some servers are slow (they have old batteries or bad connections). If the system waits for them to catch up, the whole network freezes.
Some servers are selfish. They might try to dump all their heavy work onto their neighbors to save their own energy, causing the neighbors to crash.
Privacy matters. Servers don't want to share their raw data (like user videos) with everyone; they just want to share what they learned about how to handle tasks.

The FAuNO Solution: The "Buffered Asynchronous" Orchestra

The authors created FAuNO (Federated Asynchronous Network Orchestrator) to solve this. Here is how it works using a simple analogy:

1. The "Local Musicians" (The Actors)

Imagine every local server is a musician in an orchestra. Each musician plays their own instrument (processes their own tasks) and knows their own immediate surroundings (their own queue of tasks).

The Actor: This is the musician's "instinct." It decides, "Should I play this note myself, or pass the sheet music to the person next to me?"
The Catch: Each musician only sees the people sitting right next to them. They don't know what's happening in the back of the hall.

2. The "Conductor" (The Critic)

In a normal orchestra, the conductor tells everyone what to do. In FAuNO, the "Conductor" is a Federated Critic.

Instead of one central conductor, the musicians share their sheet music notes (their learning updates) with a central hub.
The hub combines these notes to create a Global Score (a better understanding of the whole room).
Crucial Twist: The musicians keep playing while the Conductor is busy updating the score. They don't stop to wait.

3. The "Buffer" (The Waiting Room)

This is the secret sauce. In many systems, if one musician is slow to send their notes, the whole orchestra waits. FAuNO uses a Buffered Asynchronous approach.

The Fast Musicians: If a musician is quick, they send their notes immediately. The Conductor puts them in a "Waiting Room" (Buffer).
The Slow Musicians: If a musician is slow, they don't block the others. The Conductor just uses the notes from the fast musicians to update the Global Score right now.
The Update: Once the Conductor has enough notes (a "quorum"), they update the Global Score and send it back to everyone. The slow musicians eventually catch up and get the new score, but the fast ones never had to wait.

Why is this better than the old ways?

Vs. The "Wait for Everyone" System (Synchronous): Imagine a group project where the teacher says, "We can't grade the project until everyone submits their part." If one student is slow, the whole class waits. FAuNO says, "Grade what we have now, and the slow student can catch up later." This keeps the system moving fast.
Vs. The "Selfish" System (Pure Local Learning): If every server only looks out for itself, they might dump all their work on a neighbor, causing a traffic jam. FAuNO's "Global Score" teaches them to cooperate. It's like the Conductor whispering, "Hey, the person next to you is full; try the person on the other side."

The Results: A Smoother Ride

The researchers tested FAuNO in a simulated city with thousands of tasks.

Fewer Dropped Tasks: Because the system is so good at balancing the load, fewer tasks get "dropped" (failed) because a server was too busy.
Faster Speed: Tasks get finished faster because the system doesn't waste time waiting for slow servers to catch up.
Adaptability: It works even when the network is messy, with some servers having bad connections and others being super fast.

In a Nutshell

FAuNO is like a smart, patient, and cooperative team leader for a group of edge servers. It lets the fast workers keep working without waiting for the slow ones, while still making sure everyone is working together toward the same goal. It ensures that your video edits, AI requests, and data processing happen quickly and reliably, even when the network is chaotic.

1. Problem Statement

The rapid growth of IoT and connected devices has created a surge in computational demands that traditional centralized cloud computing cannot efficiently handle due to latency and bandwidth constraints. Edge Computing (EC) addresses this by moving resources closer to users, but it introduces new challenges in Task Offloading (TO):

Decentralization: Centralized orchestration creates bottlenecks and single points of failure.
Dynamic Environments: Edge systems are time-varying, heterogeneous, and partially observable (agents cannot see the global state).
Stragglers: In Federated Learning (FL) settings, slow nodes (stragglers) can delay global model aggregation, reducing training efficiency and wasting samples.
Selfish Behavior: In Multi-Agent Reinforcement Learning (MARL), agents may optimize for local rewards at the expense of global system performance.

The core problem is to design a decentralized, efficient, and robust framework for task offloading that minimizes task loss and latency while accommodating heterogeneous network conditions and partial observability.

2. Methodology: The FAuNO Framework

The authors propose FAuNO (Federated Asynchronous Network Orchestrator), a novel framework combining Actor-Critic Multi-Agent Reinforcement Learning (MARL) with Semi-Asynchronous Federated Learning.

Key Architectural Components:

Actor-Critic Architecture:
- Local Actors: Each edge node runs a local actor network (using Proximal Policy Optimization - PPO) that learns node-specific dynamics and makes offloading decisions (process locally or offload to a neighbor) based on local observations.
- Federated Critic: Instead of federating the entire model, FAuNO federates only the Critic network. The critic aggregates experience across agents to evaluate the value of actions, guiding local actors toward efficient cooperation and global optimization without requiring raw data exchange.
Semi-Asynchronous Aggregation (Buffered Strategy):
- To address the "straggler problem," FAuNO adopts a buffered semi-asynchronous approach (extending FedBuff to RL).
- Mechanism: Faster agents continue training and sending updates to a Global Manager (GM) without waiting for slower agents. The GM maintains a buffer of updates.
- Update Logic: When a new update arrives from a known agent, it replaces the older entry in the buffer. Aggregation occurs once updates from a threshold number ( $K$ ) of distinct agents are received.
- Weighting: Updates are weighted based on the number of local training steps performed, ensuring that faster agents contribute more frequently but do not dominate the model entirely.
Problem Formulation:
- The TO problem is modeled as a Partially-Observable Markov Game (POMG).
- Objective: Minimize a composite delay function comprising waiting time, execution time, and communication time, while penalizing task drops (deadline violations and queue overflows).
- Observations: Agents observe local queue sizes, neighbor states, and task attributes, but not the full global state.

3. Key Contributions

Novel Framework: FAuNO is the first framework to integrate buffered semi-asynchronous aggregation with Actor-Critic MARL (PPO) specifically for decentralized task offloading in edge systems.
Straggler Mitigation: By allowing faster nodes to contribute multiple updates without blocking, FAuNO significantly improves sample efficiency under heterogeneous conditions compared to synchronous FL.
Privacy and Efficiency: By federating only the critic and keeping actors local, the framework respects partial observability, minimizes communication overhead, and avoids sharing raw data.
Realistic Benchmarking: The authors utilized PeersimGym, a realistic simulation environment using topologies generated by the Ether tool and workloads derived from Alibaba Cluster traces, providing a more rigorous evaluation than synthetic baselines.

4. Experimental Results

The framework was evaluated against two baselines:

Least Queues (LQ): A heuristic that offloads to the neighbor with the shortest queue.
SCOF: A state-of-the-art synchronous Federated RL algorithm (re-implemented by the authors).

Performance Metrics: Task completion rate (percentage of tasks successfully returned) and average response time (latency).

Key Findings:

Task Completion: FAuNO consistently achieved the highest task completion rates across various topologies (Ether-based hierarchical and random synthetic) and task arrival rates ( $\lambda$ ). It outperformed SCOF significantly, particularly under high load, where SCOF suffered from task drops due to its synchronous nature and lack of local specialization.
Latency:
- In structured (Ether) topologies, FAuNO matched or exceeded SCOF in response time while maintaining superior completion rates.
- In random topologies, the heuristic LQ achieved slightly higher completion rates due to aggressive offloading to abundant remote resources, but at the cost of significantly higher latency. FAuNO provided the best trade-off, balancing high completion rates with lower latency than LQ.
Robustness to Heterogeneity: In experiments with non-IID (non-identically distributed) workloads, FAuNO's federated critic converged stably, showing low "global disagreement" scores compared to pure MARL (which diverged) and approaching the performance of a centralized oracle.
Ablation Studies:
- Federating only the Critic (FAuNO) proved superior to federating only the Actor or both, as federating the Actor introduced instability due to weight perturbations.
- The semi-asynchronous buffer mechanism proved robust even when up to 80% of gradients were dropped (simulating severe straggler conditions).

5. Significance and Impact

Scalability: FAuNO demonstrates that decentralized edge systems can achieve near-centralized performance levels without the communication bottlenecks of full global state sharing.
Practicality: The semi-asynchronous approach directly addresses real-world edge constraints where device heterogeneity and network instability are common.
Adaptability: The framework effectively handles dynamic workloads and varying network topologies, making it suitable for diverse IoT and mobile edge scenarios.
Future Directions: The authors identify limitations regarding security (assumption of honest nodes) and the single point of failure (Global Manager), suggesting future work on hierarchical critics and Byzantine fault tolerance.

In conclusion, FAuNO presents a robust, efficient, and scalable solution for decentralized task offloading, leveraging the strengths of Federated Learning and Reinforcement Learning to optimize edge computing performance in dynamic, heterogeneous environments.