InfoGatherer: Principled Information Seeking via Evidence Retrieval and Strategic Questioning

Imagine you are a detective trying to solve a mystery, but instead of a crime scene, you are dealing with a patient who has a sore throat or a client with a legal problem. The person comes to you with a vague story: "I have a cough and a sore throat."

In the past, if you asked an AI (a Large Language Model) to solve this, it would often act like a confident but overzealous detective. It would look at the limited clues, guess the culprit (maybe "Flu"), and give you a very loud, very sure answer, even if it was wrong. It hates saying, "I don't know enough yet."

INFOGATHERER is a new, smarter way to build these AI detectives. It changes the game from "guessing fast" to "investigating carefully."

Here is how it works, using some simple analogies:

1. The Two-Pronged Investigation

Most AI detectives rely only on what they memorized in school (their internal training data). If that data is old or missing a detail, they fail.

INFOGATHERER has two tools:

The Library (Retrieved Documents): Before asking the user anything, it runs to a library of trusted, up-to-date rulebooks (like medical textbooks or legal codes) to see what the rules say about the symptoms.
The Interviewer (Strategic Questioning): It doesn't just guess; it asks the user specific questions to fill in the blanks.

2. The "Fuzzy Map" vs. The "Sharp Guess"

This is the most important part.

Old AI: Uses a Sharp Guess. It says, "There is a 90% chance it's the Flu." If it's wrong, it's still 90% sure it's right. This is dangerous in medicine or law.
INFOGATHERER: Uses a Fuzzy Map (based on something called Dempster-Shafer Theory). Instead of forcing a single percentage, it admits, "I have some evidence for Flu, some for Allergies, and a big chunk of 'I don't know yet'."

The Analogy: Imagine you are trying to find a lost dog.

Old AI points at a bush and says, "It's definitely in that bush!" (Even if the dog is actually in the garage).
INFOGATHERER draws a circle around the neighborhood and says, "The dog is likely in the neighborhood, but I'm not sure which house. I need to check the garage next." It keeps the "I don't know" part visible until it finds the dog.

3. The Detective's Strategy (The "Stop" Button)

How does the AI know when to stop asking questions?

Old AI often stops too early because it feels "confident" (even if that confidence is fake).
INFOGATHERER uses a Confidence Meter that only moves when it finds new, solid evidence. It keeps asking questions until the "I don't know" part of its Fuzzy Map shrinks enough that it can point to one specific answer with high certainty.

It's like a game of 20 Questions, but the AI is playing perfectly. It doesn't ask, "Is it an animal?" if it already knows it's a disease. It asks, "Do you have a fever?" because that specific question will cut the list of possibilities in half.

4. Why This Matters

The paper tested this on Medical (diagnosing diseases) and Legal (solving court cases) problems.

The Result: INFOGATHERER got the right answer more often than other AI methods, but it asked fewer questions.
The Reason: It didn't waste time asking useless questions, and it didn't guess until it was sure.

The Big Picture

Think of INFOGATHERER as a wise, cautious expert rather than a fast, confident guesser.

It reads the rulebook first.
It admits when it is confused.
It asks the right questions to clear up the confusion.
It only gives an answer when it has built a solid case.

In high-stakes fields like healthcare and law, where a wrong guess can hurt someone, this shift from "confident guessing" to "principled investigation" is a huge step forward for making AI trustworthy.

Here is a detailed technical summary of the paper "INFOGATHERER: Principled Information Seeking via Evidence Retrieval and Strategic Questioning."

1. Problem Statement

Large Language Models (LLMs) are increasingly deployed in high-stakes domains like healthcare and law. However, they face significant challenges when user queries are underspecified (vague or incomplete).

Overconfidence: Current LLMs are trained to provide confident answers even when information is missing, leading to incorrect predictions.
Ineffective Questioning: Existing information-seeking methods often rely on implicit, unstructured confidence signals (e.g., logits or verbalized probabilities) from the LLM. These signals are frequently uncalibrated and fail to accurately reflect the model's true uncertainty regarding context completeness.
Knowledge Limitations: Many systems rely solely on the LLM's parametric knowledge, which can be outdated or incomplete, rather than grounding decisions in authoritative external documents.

The core problem is how to build an agent that can principledly identify information gaps, retrieve relevant evidence, and ask targeted follow-up questions to reach a reliable decision without prematurely collapsing into a definitive but potentially wrong answer.

2. Methodology: INFOGATHERER

INFOGATHERER is a framework that combines retrieval-augmented generation (RAG) with Dempster-Shafer (DS) evidential theory to model uncertainty and guide strategic questioning.

A. Core Components

Evidence Retrieval & Network Construction:
- The agent retrieves relevant documents from a domain corpus (e.g., medical guidelines or legal statutes) based on the user's initial query.
- It constructs a Directed Acyclic Graph (DAG) called an Evidential Network.
  - Nodes: Represent domain variables (e.g., symptoms, legal facts) and the final hypothesis (e.g., diagnosis, legal ruling).
  - Edges: Represent dependencies between variables.
- BBA Elicitation: Instead of extracting point probabilities, the system prompts an LLM to generate Basic Belief Assignments (BBAs) for each edge. A BBA assigns "belief mass" to subsets of possible states (e.g., $\{A, B\}$ ) rather than singletons. This allows the system to explicitly represent ambiguity (supporting a disjunction) and ignorance (assigning mass to the entire frame of discernment) without forcing a premature commitment.
Belief Updating (Evidence Fusion):
- As the dialogue progresses, the agent asks questions and receives user responses.
- User answers are mapped to BBAs (handling uncertainty in user responses by assigning mass to subsets or the full frame).
- Yager's Rule of Combination: The system fuses new user evidence with existing document-based evidence using Yager's rule. Unlike standard Dempster's rule, Yager's rule preserves contradictions by transferring conflict mass to the "ignorance" set ( $\Theta$ ) rather than normalizing it away. This prevents the system from becoming overconfident when evidence is contradictory.
Strategic Question Generation:
- The agent selects the next question by evaluating the expected reduction in Deng Entropy (a measure of uncertainty in DS theory).
- Deng entropy decomposes uncertainty into nonspecificity (ambiguity among large sets) and discord (conflict among specific hypotheses).
- Two-Stage Policy:
  1. Prioritize reducing nonspecificity (breaking down broad, ambiguous sets).
  2. Once nonspecificity is low, prioritize reducing discord (distinguishing between specific competing hypotheses).
- The agent simulates hypothetical answers to calculate the expected information gain and selects the question that maximizes this gain.
Stopping Condition:
- The dialogue terminates when the pignistic probability (a transformation of BBA to a probability distribution for decision-making) of the leading hypothesis exceeds a confidence threshold ( $\tau_{conf}$ ). If the threshold is not met after a maximum number of turns, the agent abstains.

3. Key Contributions

Principled Uncertainty Modeling: Unlike prior work using heuristic LLM confidence scores, INFOGATHERER uses Dempster-Shafer theory to explicitly model ambiguity and ignorance. This allows the system to distinguish between "I don't know" (ignorance) and "It could be A or B" (ambiguity).
Document-Grounded Reasoning: The framework grounds its reasoning in retrieved authoritative documents rather than relying solely on the LLM's internal parametric knowledge, ensuring the evidence is up-to-date and reliable.
Strategic Questioning via Deng Entropy: The introduction of a multi-objective policy that systematically reduces uncertainty by first resolving broad ambiguities and then discriminating between specific hypotheses.
Robust Fusion Mechanism: The use of Yager's rule for fusing evidence ensures that contradictory information increases the system's expressed ignorance rather than forcing a potentially erroneous confident prediction.

4. Experimental Results

The authors evaluated INFOGATHERER on two high-stakes benchmarks:

Medical Domain: MedQA (clinical reasoning).
Legal Domain: BarExamQA (legal reasoning).

Key Findings:

Superior Performance: INFOGATHERER significantly outperformed strong baselines (including Ask-or-Predict, MediQ, and Uncertainty of Thoughts) in Success Rate (accuracy of the final answer) across both domains and different LLM backbones (GPT-5-nano and Qwen3 32B).
- Example: In the legal domain with GPT-5-nano, INFOGATHERER achieved 66.5% success vs. 32.8% for the next best baseline (UoT).
Efficiency: Despite asking fewer questions on average, INFOGATHERER achieved higher accuracy. It avoided the inefficiency of asking irrelevant questions or asking too many questions without gaining decisive information.
Ablation Studies:
- BBA vs. Point Probabilities: Replacing BBAs with point probabilities (IG Bayesian) significantly reduced performance, confirming that explicit modeling of ambiguity is crucial.
- Document Grounding: Removing external documents (No-Docs) or replacing them with model-generated references (LLM-Docs) degraded performance, particularly in the legal domain where external statutes are critical.
Confidence Calibration: Analysis showed that INFOGATHERER's confidence in the correct hypothesis increased smoothly and aligned with information gain, whereas baselines often exhibited overconfidence or erratic confidence trajectories.

5. Significance

INFOGATHERER represents a significant step toward trustworthy AI in high-stakes domains.

Safety: By explicitly modeling ignorance and refusing to answer when uncertainty is too high (abstention), it mitigates the risk of harmful hallucinations in medical and legal contexts.
Interpretability: The evidential network provides a transparent structure for why the agent asked a specific question and how evidence was combined, moving beyond the "black box" nature of standard LLM prompting.
Generalizability: The framework is domain-agnostic and can be applied to any task requiring iterative information gathering and decision-making under uncertainty, provided a relevant document corpus exists.

In summary, INFOGATHERER shifts the paradigm from "guessing the answer" to "systematically gathering evidence," using formal mathematical theory to ensure that the agent's confidence is well-calibrated to the actual information available.

InfoGatherer: Principled Information Seeking via Evidence Retrieval and Strategic Questioning

1. The Two-Pronged Investigation

2. The "Fuzzy Map" vs. The "Sharp Guess"

3. The Detective's Strategy (The "Stop" Button)

4. Why This Matters

The Big Picture

1. Problem Statement

2. Methodology: INFOGATHERER

A. Core Components

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Speculative Decoding Scaling Laws (SDSL): Throughput Optimization Made Simple

Summarize Before You Speak with ARACH: A Training-Free Inference-Time Plug-In for Enhancing LLMs via Global Attention Reallocation

DeReason: A Difficulty-Aware Curriculum Improves Decoupled SFT-then-RL Training for General Reasoning

MDER-DR: Multi-Hop Question Answering with Entity-Centric Summaries

Markovian Generation Chains in Large Language Models