UrbanHuRo: A Two-Layer Human-Robot Collaboration Framework for the Joint Optimization of Heterogeneous Urban Services

Imagine a bustling city as a giant, chaotic dance floor. On one side, you have human couriers (like food delivery drivers) rushing around to drop off pizzas and burgers. On the other side, you have sensing robots (autonomous vehicles) driving around to collect data about traffic, air quality, and road conditions.

For a long time, these two groups danced to their own separate tunes. The humans focused only on getting food to people fast, while the robots focused only on mapping the city. They ignored each other, even though they were often driving down the same streets at the same time.

The Problem:
The city was inefficient.

The robots were driving empty-handed, missing chances to help deliver food.
The humans were driving full of food, missing chances to help the robots gather data.
It was like having a team of chefs who never wash dishes, and a team of dishwashers who never cook. Everyone is working hard, but the restaurant isn't running smoothly.

The Solution: UrbanHuRo
The authors of this paper created a new "dance instructor" called UrbanHuRo. Think of it as a smart, two-layer brain that coordinates the humans and robots so they can help each other without getting in the way.

Here is how it works, using simple analogies:

Layer 1: The "Smart Dispatcher" (The Matchmaker)

Imagine a busy restaurant kitchen. The manager (the Dispatcher) has to decide who takes which order.

Old Way: The manager just looks at who is closest to the customer.
UrbanHuRo Way: The manager looks at the whole picture. "Hey, Driver A is going to the park anyway. Let's give them a pizza order and tell them to take a quick air-quality reading on the way. Meanwhile, Robot B is free; let's send it to help with a rush order so the human driver doesn't get overwhelmed."

To do this mathematically without getting a computer headache, they used a MapReduce system. Think of this like a massive group project in school. Instead of one teacher trying to grade 1,000 papers alone, they split the papers among 50 students (computers), who grade their own piles quickly, and then the teacher combines the results. This allows the system to make thousands of decisions in the blink of an eye, even when the city is crazy busy.

Layer 2: The "Robot Navigator" (The GPS with a Brain)

Once the orders are assigned, the robots need to know where to go next.

The Human Couriers: They are free agents. They know the city best and will naturally take the fastest, most profitable routes. The system trusts them to do their thing.
The Robots: They need instructions. The system uses a Deep Learning algorithm (like a video game AI that learns by playing thousands of times) to tell the robots where to drive.
- If a robot is carrying food, it prioritizes getting the food there on time.
- If a robot is empty, it drives to areas the city hasn't "seen" yet to gather fresh data.
- Crucially, the robot learns to balance these two goals. It knows, "If I stop to take a photo of a pothole, I might be late with the pizza, so I'll skip the photo and deliver first."

The Magic Trick: "Hybrid Rewards"

The hardest part of this dance is that the two goals (delivering food vs. gathering data) often conflict.

The Solution: The system uses a "hybrid score." It doesn't just ask, "Did we get the food there?" It also asks, "Did we get extra value from the trip?"
If a human driver delivers a pizza and happens to pass a smoggy area, the system gives them credit for both the pizza and the data.
If a robot helps deliver a pizza, it gets credit for the delivery and the fact that it cleared the way for more data collection later.

The Results: A Win-Win Dance

When they tested this system using real data from Shanghai (with over 160,000 food orders), the results were amazing:

Fewer Late Pizzas: The number of overdue orders dropped significantly. The robots helped the humans when things got busy, acting like a safety net.
Better City Data: The system covered 29.7% more ground for sensing than previous methods. The robots and humans covered more territory together than they ever could apart.
Happier Couriers: The human drivers earned 39.2% more money on average. Because fewer orders were late, they didn't get penalized, and the system sent them more efficient routes.

The Bottom Line

UrbanHuRo is like a conductor for a city orchestra. Instead of the drums (delivery) and the violins (sensing) playing out of sync, it gets them to play a harmonious duet. The humans get paid more and work less stressfully, the robots get more done, and the city gets cleaner air and better traffic data—all because everyone started helping each other out.

Here is a detailed technical summary of the paper "UrbanHuRo: A Two-Layer Human-Robot Collaboration Framework for the Joint Optimization of Heterogeneous Urban Services."

1. Problem Statement

The paper addresses the inefficiency in current smart city systems where heterogeneous urban services (e.g., food delivery and urban environmental sensing) are optimized in isolation.

The Gap: Existing research treats services independently, missing opportunities for resource sharing. For instance, human couriers could collect sensing data while delivering, and autonomous robots (RVs) could assist with deliveries during peak hours.
The Challenge: Joint optimization is difficult due to:
1. Conflicting Objectives: Maximizing delivery speed/income often conflicts with maximizing sensing coverage.
2. Asynchronous Rewards: The benefit of a delivery decision on sensing coverage cannot be known immediately; it depends on future routing actions.
3. Dynamic Coordination: Real-time coordination of large fleets of humans (with diverse preferences) and robots (strictly following instructions) in dynamic urban environments is computationally demanding.

2. Methodology: The UrbanHuRo Framework

The authors propose UrbanHuRo, a two-layer human-robot collaboration framework modeled as a Markov Decision Process (MDP). The system integrates human couriers and Robot Vehicles (RVs) to jointly optimize order dispatch and route planning.

A. System Architecture

Agents: Human couriers ( $c_i$ ) and RVs ( $rv_i$ ).
State Space: Includes agent location, availability, type, order details (pickup/drop-off, deadline, fee), and sensing history.
Actions:
- Dispatch Layer: Assigns pending orders to available agents.
- Sensing Layer: Determines routing actions for RVs (8-connected grid movement). Human couriers follow their own profit-maximizing routes.
Reward Function: A hybrid reward combining delivery income (discounted by lateness) and sensing rewards (regional value + neighboring value - penalty for missed deadlines).

B. Core Components

The framework consists of two coupled layers:

1. Upper Layer: KSubMR (Order Dispatch)

Goal: Assign orders to agents to maximize a weighted sum of immediate delivery rewards and estimated future sensing values.
Algorithm: A scalable, distributed MapReduce-based K-Submodular Maximization algorithm.
- Why K-Submodular? The dispatch function exhibits submodularity (diminishing returns), allowing for efficient approximation guarantees.
- Mechanism: It uses a two-round MapReduce process. Worker machines compute local "Top-N" dispatch pairs based on delivery and sensing quotients. The master machine aggregates these, reconstructs a bipartite graph, and applies a threshold-based dispatch algorithm to generate the final assignment.
- Innovation: It solves the "asynchronous feedback" problem by using estimated sensing values (provided by the lower layer) during the dispatch decision, rather than waiting for actual sensing outcomes.

2. Lower Layer: DSRQN (Route Planning)

Goal: Plan routes for RVs to maximize sensing coverage while ensuring on-time delivery.
Algorithm: Deep Submodular Reward Q-Network (DSRQN).
- Mechanism: A Deep Reinforcement Learning (DRL) agent that learns a Q-function to estimate the expected cumulative sensing reward.
- Submodular Aggregation: To handle spatial redundancy, the sensing value is aggregated using a submodular function that penalizes overlapping paths.
- Feedback Loop: DSRQN calculates the sensing value ( $v_s$ ) for specific order-agent pairs and feeds this back to the KSubMR module to inform dispatch decisions.

3. Key Contributions

Conceptual: First framework to jointly optimize heterogeneous services (crowdsourced delivery and urban sensing) via human-robot collaboration, leveraging idle resources across services.
Technical:
- KSubMR: A scalable, distributed algorithm for real-time order dispatch that handles the computational complexity of large-scale matching using MapReduce and submodular maximization.
- DSRQN: A novel DRL algorithm that integrates submodular reward functions to balance sensing coverage with delivery constraints and provides value estimates for the upper layer.
Experimental: Validation on a real-world dataset (160K orders from Shanghai) demonstrating significant improvements over state-of-the-art baselines.

4. Experimental Results

The system was evaluated using a real-world food delivery dataset from Shanghai (160K orders, ~2,200 couriers) with varying numbers of RVs (500–4,000).

Sensing Coverage: UrbanHuRo improved sensing coverage by an average of 29.7% compared to the best baseline (HighS) when using 1,000–4,000 RVs.
Courier Income: Human courier income increased by 39.2% on average. This is attributed to reduced overdue penalties and better order distribution.
Overdue Orders: The system significantly reduced the number of overdue orders, performing comparably to "Fastest-Delivery" baselines (which ignore sensing) while vastly outperforming baselines that prioritize sensing at the cost of delivery speed.
Ablation Study: Removing the timeout penalty ( $r^{pen}$ ) led to a massive spike in overdue orders, confirming the necessity of balancing sensing with delivery deadlines.

5. Significance

Scalability: The MapReduce-based approach allows the system to handle massive fleets and high-order volumes in real-time, a limitation of traditional centralized optimization methods.
Sustainability: By increasing courier income and reducing wasted trips (overdue orders), the framework promotes a more sustainable economic model for gig workers.
Smart City Synergy: It demonstrates a practical "win-win" paradigm where human labor and autonomous robots complement each other, turning routine delivery tasks into opportunities for high-value urban data collection without compromising service quality.

UrbanHuRo: A Two-Layer Human-Robot Collaboration Framework for the Joint Optimization of Heterogeneous Urban Services

Layer 1: The "Smart Dispatcher" (The Matchmaker)

Layer 2: The "Robot Navigator" (The GPS with a Brain)

The Magic Trick: "Hybrid Rewards"

The Results: A Win-Win Dance

The Bottom Line

1. Problem Statement

2. Methodology: The UrbanHuRo Framework

A. System Architecture

B. Core Components

3. Key Contributions

4. Experimental Results

5. Significance

More like this

RoboLayout: Differentiable 3D Scene Generation for Embodied Agents

Real-Time AI Service Economy: A Framework for Agentic Computing Across the Continuum

Reasoning Models Struggle to Control their Chains of Thought

Evolving Medical Imaging Agents via Experience-driven Self-skill Discovery

The World Won't Stay Still: Programmable Evolution for Agent Benchmarks