IntelliAsk: Learning to Ask High-Quality Research Questions via RLVR

This paper introduces IntelliAsk, a question-generation model trained via RLVR with a novel reward model (IntelliReward) and DAPO optimization to produce high-quality, evidence-based research questions that outperform human reviewers and strong baselines in expert evaluations while also enhancing broader reasoning and writing capabilities.

Karun Sharma, Vidushee Vats, Shengzhi Li, Yuxiang Wang, Zhongtian Sun, Prayag Tiwari

Published 2026-03-09
📖 4 min read☕ Coffee break read

Imagine you are a chef who just finished cooking a magnificent, complex dish. You invite a food critic to taste it.

The Problem:
In the world of academic research, the "critics" are peer reviewers. Their job is to taste the "dish" (the research paper) and ask tough, thoughtful questions to make the chef (the author) improve the recipe.

However, there's a crisis. Because there are too many papers and not enough time, many reviewers are using AI (Large Language Models) to write their questions. The problem? These AI questions are like a food critic saying, "This soup is salty." It's true, but it's shallow. It doesn't ask why the salt was added, if the recipe called for it, or if the saltiness hides a bad ingredient. The AI is just repeating what it sees on the first page of the menu, rather than actually tasting the whole meal.

The Solution: IntelliAsk
The authors of this paper, "IntelliAsk," decided to build a new kind of AI critic. They didn't just want an AI that sounds polite; they wanted an AI that asks deep, evidence-based questions that actually help the author improve.

Here is how they did it, broken down into simple steps:

1. The "Taste Test" (The Data)

First, the team went to a giant library of past reviews (from a conference called ICLR). They didn't just grab any question; they hired expert human "tasters" to grade thousands of questions based on three specific flavors:

  • Effort: Did the critic actually think hard about this, or did they just copy-paste a generic complaint?
  • Evidence: Did the critic point to a specific ingredient (a specific chart, equation, or paragraph) in the paper?
  • Grounding: Is the question actually about this specific dish, or is it a generic question that could apply to any soup in the world?

2. The "Smart Judge" (IntelliReward)

The team realized that asking humans to grade every single AI question is too slow and expensive. So, they built a "Smart Judge" robot called IntelliReward.

Think of IntelliReward as a taste-test robot that was trained by the human experts. It can look at a question and a paper and instantly say, "This question is shallow and lazy (Low Score)" or "This question digs deep into the chemistry of the recipe (High Score)." It learned to spot the difference between a question that sounds smart and one that is smart.

3. The "Training Camp" (Reinforcement Learning)

This is the most important part.

  • Old Way (SFT): Usually, you teach an AI by showing it examples of good questions and saying, "Copy this style." This is like teaching a student to memorize a textbook. The student learns to sound like a critic, but they don't actually understand the food. They just mimic the tone.
  • New Way (IntelliAsk): The authors used a technique called Reinforcement Learning. Imagine a dog trainer.
    1. The AI (the dog) tries to ask a question.
    2. The Smart Judge (IntelliReward) gives it a treat (a high score) if the question is deep and evidence-based.
    3. If the question is shallow, the judge gives a "no treat" (a low score).
    4. The AI learns through trial and error: "Oh, if I look at the data tables and ask about the numbers, I get a treat! If I just look at the title, I don't."

Over time, the AI stops just mimicking the style of a critic and starts actually thinking like one.

The Results: A New Super-Critic

When they tested their new AI, IntelliAsk, against other top AI models and even human reviewers:

  • Depth: IntelliAsk asked questions that required real thinking, not just surface-level observations.
  • Focus: It didn't just look at the first page of the paper (where the summary is); it read the whole thing, including the complex data in the back.
  • Bonus: Surprisingly, by learning to ask better questions, the AI also got better at writing and reasoning in general. It's like a chef who, by learning to critique food deeply, also becomes a better cook themselves.

The Big Picture

This paper is a wake-up call. It shows that simply telling an AI to "write like a human" isn't enough. If you want an AI to be truly helpful, you have to teach it to think critically, look for evidence, and put in the effort.

IntelliAsk is the first AI that doesn't just pretend to be a smart reviewer; it actually becomes one, helping researchers improve their work with questions that matter.