RetroReasoner: A Reasoning LLM for Strategic Retrosynthesis Prediction

Imagine you are a master chef trying to figure out exactly how a famous, complex dish (like a multi-layered chocolate cake with gold leaf) was made. You have the finished cake in front of you, but the recipe is lost. Your goal is to work backward to discover the raw ingredients and the specific steps the original chef took to create it.

In the world of chemistry, this is called Retrosynthesis. It's the art of taking a finished molecule and figuring out what simpler chemicals were mixed together to build it.

For a long time, computers trying to solve this puzzle were like students who just memorized the answer key. They could guess the ingredients, but they didn't understand why those ingredients were chosen. They lacked the "strategic thinking" of a real chemist.

Enter RetroReasoner, a new AI model that doesn't just guess; it thinks like a chemist. Here is how it works, broken down into simple concepts:

1. The Problem: The "Black Box" Guessers

Previous AI models were like a magic 8-ball. You asked, "What are the ingredients?" and it gave an answer. Sometimes it was right, but often it was just a lucky guess or a generic guess based on patterns it had seen before. It couldn't explain its logic, and if the recipe was unusual (like a dish with a rare spice), it would often fail completely.

2. The Solution: Teaching the AI to "Think Aloud"

The researchers behind RetroReasoner realized that to solve this, the AI needs to follow a specific mental checklist, just like a human chemist does. They created a training framework called SyntheticRetro.

Think of SyntheticRetro as a "ghostwriter" for the AI. It takes millions of real chemical recipes and rewrites them into a step-by-step story. Instead of just saying "Mix A and B," it teaches the AI to say:

Step 1 (Product Analysis): "Look at this cake. It has a chocolate layer and a strawberry layer."
Step 2 (Finding the Weak Spot): "I see a seam where the chocolate meets the strawberry. That's the easiest place to pull them apart."
Step 3 (The Cut): "If I cut here, I get two separate pieces: a chocolate block and a strawberry block."
Step 4 (The Ingredients): "The chocolate block was likely made from cocoa and sugar. The strawberry block came from fresh strawberries and gelatin."

RetroReasoner learns this "thinking aloud" process. It doesn't just output the answer; it outputs the reasoning that leads to the answer.

3. The Training: The "Taste Test" (Round-Trip Accuracy)

How do you know if the AI's guess is actually good? In chemistry, you can't just check if the answer matches a textbook list, because there are often many different ways to make the same cake.

The researchers used a clever trick called Round-Trip Accuracy.

The Forward Trip: The AI guesses the ingredients (e.g., "Flour and Eggs").
The Return Trip: They feed those guessed ingredients into a different AI that acts like a forward-cooking simulator. It tries to "cook" the ingredients to see what dish it produces.
The Reward: If the simulated dish turns out to be the exact same cake you started with, the AI gets a high score (a reward). If it makes a mess or a different cake, it gets a low score.

This is like a game of "Telephone" where the message must come back to you perfectly. This forces the AI to find ingredients that are not just theoretically possible, but actually workable in a real lab.

4. The Results: Why It Matters

When tested, RetroReasoner was like a seasoned master chef compared to the previous "guessing" models.

It handles the weird stuff: When given a recipe with rare, strange ingredients (rare atoms or complex reactions), RetroReasoner didn't panic. Because it understands the strategy of cutting bonds, it could figure out how to build even the most bizarre molecules.
It offers more options: Instead of giving one single answer, it could suggest several different valid ways to make the molecule, giving human chemists more choices.
It's explainable: Because it writes out its reasoning steps, a human chemist can look at its work, say, "Ah, I see why it chose that cut," and trust the result.

The Big Picture

RetroReasoner is a bridge between raw data and human intuition. It teaches AI to stop memorizing answers and start understanding the logic of chemistry. By mimicking the strategic thinking of human experts and using a "cooking simulation" to verify its work, it promises to speed up the discovery of new medicines, materials, and chemicals, turning the complex puzzle of molecular building into a solvable game.

1. Problem Statement

Retrosynthesis prediction is a fundamental task in organic synthesis where the goal is to identify the reactants required to synthesize a target product molecule.

Current Limitations: Traditional methods rely on chemists' expertise to perform "strategic bond disconnection," a process that is time-consuming and requires deep domain knowledge.
LLM Shortcomings: While recent Large Language Models (LLMs) applied to chemistry (Molecular LLMs) have shown promise, they suffer from two main issues:
1. Lack of Reasoning: Many models predict reactants directly from products without an explicit intermediate reasoning process, leading to a "black box" approach.
2. Generic Analysis: Existing "reasoning" models often perform only a generic analysis of the product's functional groups without logically connecting these observations to specific bond disconnection strategies or reactant selection. This results in a logical disconnect where the reasoning does not actually lead to the correct reactants.
The Challenge: The task is inherently multi-modal in terms of solutions; a single product can often be synthesized via multiple valid reactant sets. Standard training objectives that penalize deviations from a single labeled reactant (exact match) fail to capture this diversity and feasibility.

2. Methodology

The authors propose RetroReasoner, a model designed to mimic the step-by-step strategic thinking of a human chemist. The approach consists of three core components:

A. SyntheticRetro: A Data Generation Framework

To train the model on strategic reasoning, the authors developed SyntheticRetro, a framework that generates structured training data from reaction SMILES (RXN SMILES).

Process: It extracts direct-usable information (from SMILES), model-predicted information (atom mapping), and rule-derived information (reaction templates, functional groups).
Structure: It uses a general-purpose LLM (GPT-oss-20B) to generate a four-step reasoning chain linked by natural language:
1. Product Analysis ( $R_1$ ): Identifying functional groups and atom mappings.
2. Candidate Substructure Identification ( $R_2$ ): Narrowing down key substructures formed during the reaction.
3. Strategic Bond Disconnection ( $R_3$ ): Selecting the specific bond to cleave to generate synthons (conceptual fragments).
4. Synthetic Equivalent Mapping ( $R_4$ ): Mapping synthons to real-world, purchasable reactants.
Diversity: To prevent overfitting to a single reasoning path, the framework generates 15 different "linking texts" (transitions between steps) for each reaction instance.

B. Two-Stage Training Pipeline

RetroReasoner is trained using a combination of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL).

Supervised Fine-Tuning (SFT): The model is trained on the structured reasoning text and reactant SMILES generated by SyntheticRetro. The model learns to output the reasoning steps followed by the predicted reactants.
Reinforcement Learning (RL): The model is further optimized using Group Relative Policy Optimization (GRPO).
- Reward Mechanism: Instead of relying solely on exact match with labeled reactants, the authors use Round-Trip Accuracy as the reward signal.
- Process: The model predicts reactants $\rightarrow$ a separate forward synthesis model predicts the product from these reactants $\rightarrow$ if the predicted product matches the original input product, the model receives a positive reward.
- Benefit: This encourages the model to explore a broader space of feasible reactants, even if they differ from the specific labeled reactant in the dataset, provided they chemically regenerate the target.

C. Model Architecture

Base Model: Qwen3-8B.
Input: Product SMILES.
Output: Structured reasoning text (XML-tagged steps) followed by the predicted reactant SMILES.

3. Key Contributions

Strategic Reasoning Framework: Introduction of a stepwise reasoning process that explicitly mimics the chemist's strategy (Analysis $\rightarrow$ Disconnection $\rightarrow$ Synthon $\rightarrow$ Equivalent), bridging the gap between generic product analysis and specific reactant prediction.
SyntheticRetro Data Framework: A novel pipeline to convert raw reaction data into high-quality, structured reasoning datasets with diverse linking paths, enabling the training of reasoning capabilities in chemical LLMs.
Round-Trip RL Optimization: The application of round-trip accuracy as a verifiable reward in RL, which effectively guides the model toward chemically feasible solutions and mitigates the bias of single-reference reactant datasets.
Empirical Validation: Comprehensive evaluation showing that strategic reasoning significantly outperforms both non-reasoning baselines and other reasoning models, particularly in handling rare and complex reaction instances.

4. Experimental Results

The model was evaluated on the ORDerly benchmark and compared against Molecular Prediction LLMs, Molecular Reasoning LLMs, and General Purpose LLMs.

In-Distribution Performance:
- RetroReasoner (RL) achieved the highest Exact@1 (0.526) and Round-trip@1 (0.826) scores, outperforming all baselines.
- It demonstrated a significant improvement in Template Diversity (3.186 vs. 2.562 for Prediction-Only), indicating it can propose a wider variety of valid reaction pathways.
Hard Instance Performance (Rare Templates/Atoms):
- On datasets containing rare reaction templates and rare atoms/tokens, RetroReasoner showed superior robustness.
- For example, in the "Rare Atom/Token" subset, RetroReasoner (RL) achieved a Feasible Ratio of 0.557 compared to 0.478 for the non-reasoning baseline, proving its ability to generalize to out-of-distribution cases.
Ablation Studies:
- Reasoning Strategy: Removing the full strategic chain (using only Product Analysis) significantly dropped performance, confirming the necessity of the full disconnection logic.
- Linking Text: Excluding the natural language linking text between steps reduced diversity and accuracy, highlighting the importance of logical flow in training.
- Reward Function: Using Round-Trip reward instead of Exact Match reward increased the Feasible Ratio and Template Diversity, proving it helps the model explore valid but unlabeled solutions.

5. Significance

Paradigm Shift: The paper moves the field from "black-box" reactant prediction to "explainable" strategic reasoning, making AI predictions more interpretable and trustworthy for chemists.
Feasibility over Exactness: By prioritizing round-trip feasibility over strict dataset matching, the model aligns better with the reality of chemical synthesis, where multiple valid routes exist.
Foundation for Agents: The stepwise reasoning capability positions RetroReasoner as a strong foundation for future autonomous chemical agents capable of multi-step retrosynthetic planning, potentially accelerating drug discovery and materials science.
Handling Complexity: The model's success on rare and complex instances suggests it can tackle the "long tail" of chemical reactions that current data-driven models often fail to predict.

In summary, RetroReasoner demonstrates that equipping LLMs with chemist-like strategic reasoning and optimizing them for chemical feasibility (via round-trip rewards) yields a significant leap in the accuracy, diversity, and robustness of automated retrosynthesis prediction.

RetroReasoner: A Reasoning LLM for Strategic Retrosynthesis Prediction

1. The Problem: The "Black Box" Guessers

2. The Solution: Teaching the AI to "Think Aloud"

3. The Training: The "Taste Test" (Round-Trip Accuracy)

4. The Results: Why It Matters

The Big Picture

1. Problem Statement

2. Methodology

A. SyntheticRetro: A Data Generation Framework

B. Two-Stage Training Pipeline

C. Model Architecture

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Complexity of Classical Acceleration for ℓ1\ell_1ℓ1​-Regularized PageRank

MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?

Language Guided Adversarial Purification

Graph-based Active Learning for Entity Cluster Repair

Neural Green's Operators for Parametric Partial Differential Equations

Complexity of Classical Acceleration for $\ell_1$ -Regularized PageRank