Learning When to Cooperate Under Heterogeneous Goals

This paper addresses the challenge of agents with heterogeneous goals deciding when to cooperate or act alone by introducing a hierarchical learning framework that combines imitation and reinforcement learning, demonstrating superior performance over baselines and revealing that modeling teammates is most beneficial when their goals are less observable.

Max Taylor-Davies, Neil Bramley, Christopher G. Lucas

Published Tue, 10 Ma
📖 5 min read🧠 Deep dive

Imagine you are at a busy coffee shop. You have a list of things you need to do: grab a coffee, pick up a package, and maybe meet a friend.

In most computer science experiments about "teamwork," the assumption is that everyone in the coffee shop wants to do the exact same thing at the exact same time. If you want a coffee, your teammate wants a coffee. If you want to meet a friend, they want to meet that friend too. The computer just learns how to coordinate to get the coffee faster.

This paper asks a different, more human question:
What if your teammate wants a coffee, but you want to pick up a package? Or what if you both want to meet a friend, but you're going to different houses?

The authors argue that true "smart" teamwork isn't just about knowing how to work together; it's about knowing when to work together and when to go it alone.

The Core Problem: The "To Team or Not to Team" Dilemma

The researchers created a new game for AI agents (computer programs) where they have to figure out if their goals overlap with their partner's.

  • Scenario A (Full Overlap): You and your partner both want to go to the park. Best move: Walk together.
  • Scenario B (No Overlap): You want to go to the park; your partner wants to go to the gym. Best move: Split up. If you try to walk together, you'll just waste time and get nowhere.
  • Scenario C (Partial Overlap): You both want to go to the park, but you also have different errands. Best move: Walk together to the park, then split up for the errands.

Most existing AI methods are like a dog that only knows how to fetch a ball. If you throw a ball, it fetches. If you throw a stick, it still tries to fetch the stick. It doesn't know that sometimes you just want to sit on the bench. This paper teaches the AI to look at the situation and decide: "Do we have a shared goal? If yes, let's collaborate. If no, I'll do my own thing."

The Solution: GRILL (The "Manager and the Worker")

To solve this, the authors built a system called GRILL (Goal selection by RL with Imitation for Low-Level control).

Think of GRILL as a company with two distinct roles:

  1. The Manager (High-Level Policy): This part of the AI looks at the big picture. It asks, "What is the goal right now? Should we try to work together, or should I go solo?" It doesn't worry about how to move; it just picks the destination.
  2. The Worker (Low-Level Policy): This part of the AI is the muscle. Once the Manager says, "Let's go get that apple," the Worker knows exactly how to walk, jump, and grab it.

The Magic Trick:
The "Worker" is trained using Imitation Learning. The AI watches a bunch of examples of how to do specific tasks (like grabbing an apple) and learns to copy them perfectly. This is like a new employee watching a training video.

The "Manager" is trained using Reinforcement Learning. It tries different strategies (team up vs. solo) and gets points for success. Over time, it learns the rule: "If my partner is also going for apples, team up! If they are going for oranges, go get the apples yourself."

The Experiments: Two Games

The researchers tested this in two video-game-like worlds:

  1. Cooperative Reaching: Two dots on a grid trying to reach a corner. Sometimes the corners have different values for each dot.
  2. Level-Based Foraging: A more complex game where agents collect fruits (apples, oranges, plums). Some fruits are heavy and need two people to carry; others are light and can be carried alone.

The Results:

  • The Old Way (Baselines): The standard AI methods got confused. In the "No Overlap" scenario (where you should go solo), they kept trying to hold hands and walk together, wasting time and getting low scores. They were "over-cooperative."
  • The New Way (GRILL): The AI learned to switch gears. When goals matched, it collaborated efficiently. When goals clashed, it confidently walked away to do its own task. It got much higher scores.

The "Sidekick" (GRILL-M)

The authors also added a little "Sidekick" feature (called GRILL-M). This is an extra module that tries to guess what the teammate is thinking or planning based on their movements.

  • When it helps: If the teammate is acting very strangely or the environment is chaotic (like the fruit game), the Sidekick helps the Manager make better guesses.
  • When it hurts: If the teammate's intentions are obvious (like in the simple grid game), the Sidekick just adds noise and confusion. It's like having a co-pilot who keeps shouting instructions when the road is perfectly straight and clear.

Why This Matters

This research is a step toward making AI that feels more "human." Real humans don't just blindly follow orders to work together. We constantly scan our environment to see if collaboration is actually useful.

  • In the real world: This could help robots in a warehouse decide whether to help a human pick up a box or just move out of the way.
  • In the future: It could help self-driving cars decide when to merge with traffic and when to stay in their lane, or help virtual assistants know when to help you and when to let you handle a task yourself.

In short: The paper teaches machines that sometimes the smartest thing to do is to stop trying to be a team player and just be a solo player. And that's a very human lesson to learn.