GoodPoint: Learning Constructive Scientific Paper Feedback from Author Responses

The paper introduces GoodPoint, a training framework that leverages author responses to curate a dataset of valid and actionable feedback, enabling a fine-tuned LLM to significantly outperform existing models in generating constructive scientific paper feedback that authors find practically valuable.

Jimin Mun, Chani Jung, Xuhui Zhou, Hyunwoo Kim, Maarten Sap

Published 2026-04-15
📖 4 min read☕ Coffee break read

The Big Idea: A Coach, Not a Robot

Imagine you are writing a masterpiece novel. You send it to a publisher, and they send it back with a stack of notes. Some notes are helpful ("The plot twist in Chapter 3 is confusing, try clarifying it"), while others are useless ("I didn't like the font" or "This book is bad because I hate your genre").

Currently, AI tools trying to act as editors often give you a lot of useless noise. They might sound confident but give generic advice that doesn't actually help you improve your story.

The authors of this paper say: "Let's not replace human editors with robots. Let's give authors a super-smart AI coach that knows exactly what kind of feedback actually works."

They built a system called GOODPOINT. Its goal isn't to write the paper for you; it's to give you feedback that you will actually listen to and act on.


How They Taught the AI: The "Author's Reply" Secret Sauce

How do you teach an AI to give good advice? You can't just ask it to "be nice." You need to know what "good" looks like.

The researchers realized that the best signal for good feedback is what the author does next.

  • The Analogy: Imagine a student turning in a homework assignment.
    • Scenario A: The teacher writes, "Your essay is boring." The student ignores it and submits the same essay next time. (Bad feedback).
    • Scenario B: The teacher writes, "Your second paragraph contradicts your thesis; try adding a transition sentence here." The student fixes it and gets an A. (Good feedback).

The researchers looked at thousands of real scientific papers and the conversations between authors and reviewers. They only kept the feedback where the author said, "You're right, and I will fix this" or "That's a good point, I'll add that experiment."

They called this "Valid and Actionable" feedback.

  • Valid: The author agrees the problem is real.
  • Actionable: The author knows exactly what to do to fix it.

They used these "winning" comments to train their AI model (based on a model called Qwen3-8B).


The Training Recipe: SFT and DPO

The team used a two-step cooking recipe to make their AI chef (GOODPOINT) perfect:

  1. SFT (Supervised Fine-Tuning) - "Learning the Recipe":
    They fed the AI thousands of examples of "winning" feedback. It's like showing a cooking student the best recipes from a Michelin-star chef so they learn the basic techniques. The AI learned, "Okay, when I see a paper, I should sound like these successful reviewers."

  2. DPO (Direct Preference Optimization) - "The Taste Test":
    This is the secret sauce. They took the AI's output and created "bad" versions of it on purpose.

    • Good Version: "The data in Figure 2 is unclear; please add error bars."
    • Bad Version (Corrupted): "Your data is bad." (Too vague) OR "Your data is wrong because I hate math." (Rude/Invalid).

    They showed the AI both versions and said, "Pick the one that is actually helpful." This taught the AI to avoid being vague, rude, or factually wrong. It learned to be precise and constructive.


The Results: Small Model, Big Impact

Usually, to get the best AI, you need a massive, expensive super-computer brain (like the giant models from Google or OpenAI).

But GOODPOINT proved that a smaller, smarter brain can beat a bigger, dumber one.

  • The Test: They tested their AI on 1,200 real scientific papers.
  • The Result:
    • The base AI (Qwen3-8B) gave feedback that authors ignored 92% of the time.
    • The GOODPOINT AI gave feedback that authors accepted and acted on 83% more often than the base model.
    • Even better: GOODPOINT was more precise than massive, expensive models like Gemini-3-flash. It didn't waste time giving generic advice; it gave specific, high-value critiques.

Why This Matters

In the past, people worried that AI would take over science, replacing human judgment with robotic, soulless reviews.

This paper argues for a different future: AI as a Power-Up.

  • It helps junior researchers or non-native English speakers get the same high-quality feedback as experts.
  • It doesn't replace the human reviewer; it acts as a "pre-reviewer" that filters out the noise and highlights the constructive points.

The Bottom Line

GOODPOINT is like a personal writing coach that has studied thousands of successful editor-author conversations. It doesn't just tell you what is wrong; it tells you how to fix it in a way that makes you want to listen. By focusing on feedback that authors actually use, the team created a tool that makes science better, faster, and more collaborative.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →