COHORT: Hybrid RL for Collaborative Large DNN Inference on Multi-Robot Systems Under Real-Time Constraints

Imagine a team of rescue robots sent into a disaster zone—like a collapsed building or a wildfire area. Their job is to look around, find survivors, identify debris, and answer questions like, "How many people do you see?" or "Where is the safest path?"

To do this, they need to run incredibly smart but very heavy computer programs called Large DNNs (Deep Neural Networks). Think of these programs as giant, hungry brains that need a lot of electricity and processing power to think.

The Problem: The "Battery vs. Brain" Dilemma

Here's the catch: These robots are small, their batteries are limited, and they often can't connect to the cloud (the internet) because the disaster zone has no Wi-Fi.

If a robot tries to run these "giant brains" entirely on its own, it drains its battery in minutes and might stop working before the mission is done.
If they try to send the data to a central server, there is no server to send it to.
If they just guess who should do the work, they might send a heavy task to a robot that is already weak, causing the whole team to fail.

The Solution: COHORT (The Smart Team Captain)

The researchers created a system called COHORT. Think of COHORT as a super-smart, invisible team captain that lives inside every robot. Its only job is to decide: "Should I do this task myself, or should I ask my teammate to do it?"

But COHORT isn't just a simple rulebook. It's a learning machine that gets smarter every time it works.

How COHORT Learns: The "Practice" and "Game Day" Strategy

The paper describes a clever two-step training process, which we can compare to training a sports team:

1. The "Practice" Phase (Offline Learning)
Before the robots ever go into the real disaster zone, they run thousands of simulations in a computer.

The Analogy: Imagine a coach watching hours of game tape. The coach uses a simple rule (like an auction) to see who should have done what. "Robot A has a full battery, so it should take the heavy lifting."
The Magic: The system records all these decisions. Then, it uses a technique called Advantage-Weighted Regression (AWR). This is like the coach saying, "Look at all the times we won. Let's memorize exactly what the winning team did and forget the times we lost." This gives the robots a solid starting strategy without them having to make dangerous mistakes in the real world first.

2. The "Game Day" Phase (Online Learning)
Now, the robots are in the real world. They use the strategy they learned in practice, but they don't stop there.

The Analogy: This is like a soccer team playing a real match. They start with the coach's playbook, but as the game goes on, they adapt. If the wind changes, or a player gets tired, they adjust their formation instantly.
The Magic: The robots use Multi-Agent PPO (MAPPO). This allows them to talk to each other (very briefly) and say, "Hey, my battery is low, you take this next task," or "I'm fast right now, give me that task." They learn in real-time, getting better at balancing the work as the mission progresses.

The "Auction" System

How do they decide who does what? They use a digital auction.

Every time a task comes up (like "Scan this area"), every robot secretly bids on how much it "costs" them to do it.
The "cost" isn't money; it's battery life, current workload, and speed.
The robot with the lowest "cost" (the one with the most energy and free time) wins the bid and does the task.
COHORT's Superpower: Unlike old systems that just look at the current moment, COHORT learns to predict the future. It might say, "I have energy now, but if I take this task, I'll be too tired for the next one. I'll let Robot B take it."

The Results: Why It Matters

The researchers tested this on three very different robots:

Husky: A big, strong, wheeled robot with a powerful computer.
Jackal: A medium-sized, agile robot.
Spot: A dog-like robot that can climb stairs but has a smaller computer.

The Outcome:

Battery Life: The team saved about 15% more battery than other methods. In a rescue mission, that extra 15% could mean the difference between finding a survivor or not.
Speed: They got 51% better at using their computer chips (GPUs), meaning they didn't waste energy on idle waiting.
Reliability: They met their speed and timing goals 2.5 times more often than the other methods.

The Bottom Line

COHORT is like giving a team of rescue robots a shared brain that knows exactly how to share the workload. It ensures that the strong robots help the weak ones, the fast robots help the slow ones, and nobody runs out of juice before the job is done. It turns a group of individual robots into a truly collaborative, resilient team that can handle the chaos of a real-world disaster.

COHORT: Hybrid RL for Collaborative Large DNN Inference on Multi-Robot Systems Under Real-Time Constraints

The Problem: The "Battery vs. Brain" Dilemma

The Solution: COHORT (The Smart Team Captain)

How COHORT Learns: The "Practice" and "Game Day" Strategy

The "Auction" System

The Results: Why It Matters

The Bottom Line

1. Problem Statement

2. Methodology: The COHORT Framework

A. System Architecture

B. Three-Phase Training Pipeline

3. Key Contributions

4. Experimental Results

5. Significance

COHORT: Hybrid RL for Collaborative Large DNN Inference on Multi-Robot Systems Under Real-Time Constraints

The Problem: The "Battery vs. Brain" Dilemma

The Solution: COHORT (The Smart Team Captain)

How COHORT Learns: The "Practice" and "Game Day" Strategy

The "Auction" System

The Results: Why It Matters

The Bottom Line

1. Problem Statement

2. Methodology: The COHORT Framework

A. System Architecture

B. Three-Phase Training Pipeline

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Monotone Comparative Statics without Lattices

Motion Illusions Generated Using Predictive Neural Networks Also Fool Humans

Performance Analysis of IEEE 802.11p Preamble Insertion in C-V2X Sidelink Signals for Co-Channel Coexistence

Construction of time-varying ISS-Lyapunov Functions for Impulsive Systems

Real-Time BDI Agents: a model and its implementation