Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People

This paper introduces the Collaborative Battleship task to evaluate language models' information-seeking abilities and proposes Bayesian Experimental Design-inspired Monte Carlo inference strategies that significantly enhance both question-asking and answer-accuracy, enabling weaker models to outperform humans and frontier models in strategic decision-making tasks.

Gabriel Grand, Valerio Pepe, Jacob Andreas, Joshua B. Tenenbaum

Published 2026-03-09
📖 5 min read🧠 Deep dive

Imagine you are playing a game of Battleship, but with a twist. You have a partner who can see the entire ocean map, but you can only see a tiny, foggy patch of water around your ship. Your goal is to find and sink your partner's hidden ships.

To do this, you have two choices every turn:

  1. Shoot: Guess where a ship is and fire a cannon.
  2. Ask: Ask your partner a "Yes" or "No" question to get a clue (e.g., "Is there a ship in the top-left corner?").

The problem? Most current AI models are terrible at this game. They either shoot wildly without thinking, or they ask silly questions that don't help them find the ships. They act like a person who "shoots first and asks questions later," often missing the target.

This paper, titled "Shoot First, Ask Questions Later? Building Rational Agents That Explore and Act Like People," introduces a new way to teach AI to play this game (and similar information-seeking tasks) much smarter.

Here is the breakdown of their discovery, using simple analogies:

1. The Problem: The "Guessing Game" AI

The researchers set up a digital version of Battleship where an AI (the Captain) has to talk to another AI (the Spotter) who sees the whole board.

  • The Captain's Job: Decide whether to ask a question or take a shot.
  • The Spotter's Job: Answer "Yes" or "No" accurately based on what they see.

They found that even smart AI models were struggling. They asked redundant questions (like asking "Is there a ship?" when they already knew the answer) or made shots that were pure guesses. They weren't acting like "rational" agents who use logic to save resources.

2. The Solution: The "Detective's Toolkit"

The authors realized that to be good at this, an AI needs to act like a detective or a scientist running an experiment. They borrowed a concept from statistics called Bayesian Experimental Design.

Think of it like this:

  • The Old Way: The AI just picks a question that sounds interesting.
  • The New Way (The "Bayesian" Way): The AI runs a mental simulation. It asks itself: "If I ask this question, how much will it narrow down the list of possibilities?"

They gave the AI three specific tools (strategies) to use:

A. The "Best Question" Filter (Bayes-Q)

Imagine you have a deck of cards face down, and you want to find the Ace.

  • Bad Question: "Is the Ace red?" (This only splits the deck in half).
  • Good Question: "Is the Ace the Ace of Spades?" (This is too specific).
  • The AI's New Strategy: The AI generates 100 possible questions, simulates the answer for each, and picks the one that cuts the "search space" in half the most efficiently. It's like using a metal detector that beeps the loudest exactly where the treasure is, rather than digging randomly.

B. The "Best Shot" Calculator (Bayes-M)

When it's time to shoot, the AI doesn't just guess. It looks at all the possible places a ship could be based on previous clues and calculates the exact probability of a hit. It's like a sniper who calculates wind speed, distance, and target movement before pulling the trigger.

C. The "Timing" Coach (Bayes-D)

This is the most human-like part. The AI learns when to ask and when to shoot.

  • Weak AI: Asks all 15 allowed questions at the very start, then shoots blindly. (Like reading the whole instruction manual before turning on the machine).
  • Smart AI: Asks a few questions, takes a shot, sees the result, asks another question, and takes another shot. It balances gathering info with taking action, just like a human expert would.

3. The Results: Superhuman Performance

The results were surprising and impressive:

  • Weak AI becomes a Grandmaster: They took a small, relatively "dumb" AI model (Llama-4-Scout) and gave it this "Detective's Toolkit." Suddenly, it didn't just play well; it beat human players 82% of the time and even beat the world's strongest AI (GPT-5) 67% of the time.
  • Cost Efficiency: The small AI did this at 1% of the cost of the giant AI. It's like teaching a smart kid to solve a math problem using a clever trick, rather than hiring a team of expensive professors to do the math for them.
  • Accuracy: For the "Spotter" (the one answering), using code to generate answers made them nearly perfect (94% accuracy), whereas just talking made them make mistakes.

4. Why This Matters

This isn't just about a board game. The authors tested this on another game called Guess Who? (where you guess a person's identity by asking yes/no questions) and got the same amazing results.

The Big Picture:
In the real world, AI is being used for things like medical diagnosis (asking the right questions to a patient to find a disease) or scientific discovery (designing experiments to find new drugs).

  • Currently, AI often asks the wrong questions or wastes resources.
  • This paper shows that if we teach AI to think probabilistically—to simulate outcomes and choose the path that gives the most information with the least effort—we can build agents that are not just "chatbots," but rational partners that can solve complex problems efficiently.

Summary Analogy

Imagine you are looking for a lost key in a messy room.

  • Normal AI: Starts picking up random objects and checking them, or asks, "Is the key under the sofa?" without checking if the sofa is even in the room.
  • This New "Rational" AI: First, it looks at the room and thinks, "The key is most likely near the door." It checks there first. If it's not there, it asks, "Did I leave the key in the kitchen?" based on a logical deduction of where it could be. It doesn't waste time checking the ceiling fan.

By giving AI this "logical brain," the researchers turned a clumsy guesser into a master strategist.