Original authors: Yuhang Zhang, Keyan Ding, Peilin Chen, Han Liu, Can Lin, Ruixi Chen, Shiqi Wang, Qi Song

Published 2026-05-26

📖 4 min read☕ Coffee break read

Original authors: Yuhang Zhang, Keyan Ding, Peilin Chen, Han Liu, Can Lin, Ruixi Chen, Shiqi Wang, Qi Song

Original paper dedicated to the public domain under CC0 1.0 (http://creativecommons.org/publicdomain/zero/1.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are a librarian trying to match two very different types of books: Enzymes (which are like tiny, complex biological machines made of proteins) and Reactions (which are like chemical recipes describing what those machines do).

For a long time, scientists have tried to build a computer system that can look at a protein and guess its recipe, or look at a recipe and guess which protein made it. But existing systems have been like clumsy librarians:

They are biased: They are great at finding the recipe if you give them the protein, but terrible at finding the protein if you give them the recipe.
They are fragile: If you change the way you organize the books (the data), the librarian suddenly forgets everything.
They only look at the spine of the book (the raw sequence of letters) and ignore the summary on the back cover (the text description of what the machine actually does).

Enter TIGER (Text-Informed Generalized Enzyme-Reaction Retrieval). Think of TIGER as a super-smart, bilingual librarian who has learned to read both the "spine" and the "summary" to make perfect matches.

Here is how TIGER works, broken down into simple parts:

1. The "Translator" (Protein-to-Text)

Traditional systems only read the raw code of the enzyme (a long string of letters like A-C-G-T...). It's like trying to understand a machine just by looking at its serial number.
TIGER uses a special AI tool to translate that serial number into a plain English summary. It reads the protein and writes a sentence like: "This machine grabs a specific molecule and turns it into something else."

Why this helps: It adds "common sense" and context that the raw code misses, making it easier to match the machine to its recipe.

2. The "Quality Control Manager" (Dynamic Gating Network)

Here is the catch: The AI writing the English summaries isn't perfect. Sometimes it hallucinates or gets things slightly wrong (like a student who studied too hard but still made a few mistakes on the test).
TIGER has a built-in Quality Control Manager (the Dynamic Gating Network).

When the AI generates a summary, this manager checks: "Does this summary make sense compared to the raw protein data?"
If the summary is good, the manager says, "Use this!" and boosts its importance.
If the summary is nonsense or noisy, the manager says, "Ignore that," and turns down the volume.
Result: The system learns to trust the good text and ignore the bad text, making it much more reliable.

3. The "Universal Translator" (Structure-Shared Feature Projector)

Even with good summaries, the "Protein Language" and the "Chemical Recipe Language" are still different dialects.
TIGER uses a Universal Translator (the Structure-Shared Feature Projector). It takes the protein's data and the reaction's data and forces them to speak the same language in a shared "meeting room" (a unified space).

This ensures that when the system looks for a match, it's comparing apples to apples, not apples to oranges. This fixes the "bias" problem, making the system just as good at finding recipes from proteins as it is at finding proteins from recipes.

4. The "Double-Check" (Bidirectional Training)

Most systems train themselves to only go one way (Protein $\to$ Recipe). TIGER trains itself to go both ways simultaneously. It constantly practices:

"Given this protein, find the recipe."
"Given this recipe, find the protein."
This double-checking makes the system robust. It doesn't matter if you throw a new, weird protein at it or a strange new recipe; the system has learned the relationship between them, not just a memorized list.

The Results: A Super-Librarian

The authors tested TIGER on a massive dataset called ReactZyme (a giant library of enzyme-reaction pairs). They challenged it with three difficult scenarios:

Time-based: Newer data the system had never seen before.
Similarity-based: Proteins that look very different from anything in the training set.
Reaction-based: Chemical reactions that were completely new.

The Outcome:
TIGER crushed the competition. While other systems stumbled and failed when the data changed, TIGER kept performing at a high level.

It improved accuracy by huge margins (sometimes doubling or tripling the success rate of previous methods).
It fixed the "bias" problem, performing equally well in both directions.
It proved that adding text descriptions (and filtering out the bad ones) is the secret sauce to understanding how biological machines work.

In short, TIGER is a system that doesn't just memorize data; it reads the "story" behind the data, filters out the lies, and learns the true connection between biological machines and their chemical recipes.

Technical Summary: TIGER

Problem Statement

Enzyme–Reaction Retrieval is a fundamental task in computational biology aimed at establishing bidirectional mappings between enzymes and the biochemical reactions they catalyze. This task underpins enzyme characterization, reaction mechanism elucidation, and the rational design of metabolic pathways. However, existing computational approaches, which predominantly rely on contrastive learning paradigms to align enzyme sequences with chemical reactions, face three critical limitations:

Bidirectional Asymmetry: Retrieval accuracy varies significantly between the enzyme-to-reaction (E→R) and reaction-to-enzyme (R→E) directions, indicating a lack of representational coherence.
Distributional Sensitivity: Performance fluctuates drastically under different dataset split strategies (e.g., time-based, enzyme similarity-based, reaction similarity-based), revealing poor generalization to unseen distributions.
Semantic Gap: Pre-trained protein models are optimized for structural and evolutionary signals rather than the specific chemical transformation features required for catalysis, leading to a disconnect between sequence data and reaction semantics.

Methodology

The authors propose TIGER (Text-Informed Generalized Enzyme-Reaction Retrieval), a framework designed to bridge the gap between enzyme sequences and biochemical reactions by leveraging textual semantic knowledge. The architecture consists of three core components:

1. Multimodal Enzyme Representation Learning

Instead of relying solely on amino acid sequences, TIGER integrates automatically generated textual descriptions of enzyme functions.

Feature Extraction: Enzyme sequences ( $s_e$ ) are encoded using the pre-trained protein language model ESM2. Simultaneously, a protein-to-text generation model (e.g., ESM2Text or ProtT3) produces a textual description ( $t_e$ ), which is embedded using PubMedBERT.
Dynamic Gating Network (DGN): Recognizing that AI-generated text can contain semantic noise or "hallucinations," the DGN adaptively fuses sequence and text features. It employs bidirectional multi-head attention to refine features across modalities and calculates a gating coefficient ( $\alpha$ ) via a sigmoid function. This coefficient dynamically weights the contribution of the textual features based on their estimated reliability relative to the sequence features, suppressing noisy signals while preserving complementary functional semantics.

2. Reaction Representation Learning

For biochemical reactions, TIGER utilizes UniMol-3D, a state-of-the-art 3D molecular encoder.

Reactions are decomposed into substrates and products.
Each molecule is encoded independently to capture graph-level and 3D conformational information.
The final reaction embedding is derived by averaging the embeddings of all constituent molecules, preserving stereochemical and geometric cues critical for catalysis.

3. Structure-Shared Feature Projector (SSFP)

To ensure that enzyme and reaction representations exist in a unified latent space, TIGER employs a Structure-Shared Feature Projector. This module maps heterogeneous inputs (enzyme embeddings and reaction embeddings) into a shared space using a symmetric pipeline involving layer normalization, multi-head self-attention, and residual connections. This design enforces semantic proximity and facilitates cross-modal alignment.

Training Objective

The framework is trained using a bidirectional contrastive learning objective. It minimizes a weighted sum of losses for both E→R and R→E directions, ensuring that semantically associated enzyme-reaction pairs are pulled closer together in the latent space while unrelated pairs are pushed apart.

Key Contributions

Text-Informed Framework: The introduction of TIGER, which leverages protein-to-text generation models to distill functional and mechanistic knowledge into enzyme representations, creating a more symmetric and semantically coherent embedding space.
Dynamic Gating Network: A novel mechanism to robustly balance sequence and text signals, specifically addressing the reliability issues of AI-generated textual descriptions by adaptively down-weighting noisy inputs.
Structure-Shared Feature Projector: A unified projection architecture that aligns enzyme and reaction modalities, enhancing cross-modal generalization.
Comprehensive Evaluation: Extensive experiments on the ReactZyme benchmark, demonstrating state-of-the-art performance across diverse and challenging evaluation splits.

Experimental Results

Evaluated on the ReactZyme dataset (containing ~178K enzyme-reaction associations), TIGER was tested under three split strategies: time-based, enzyme similarity-based, and reaction similarity-based.

Performance Gains: TIGER consistently outperformed strong baselines (including CLIPZyme, UniMol-3D variants, and Bi-RNNs).
- On the time-based split, TIGER improved Hit@1 for E→R from 0.391 (best baseline) to 0.583 and for R→E from 0.265 to 0.454.
- On the enzyme similarity-based split, TIGER achieved Hit@1 scores of 0.931 (E→R) and 0.792 (R→E).
- On the most challenging reaction similarity-based split (requiring extrapolation to unseen reactions), TIGER achieved Hit@1 scores of 0.416 (E→R) and 0.430 (R→E), representing a nearly fourfold improvement over the strongest baseline in the R→E direction.
Robustness and Symmetry: Unlike baselines that suffer from severe performance collapse under distribution shifts or exhibit high directional asymmetry, TIGER maintained balanced performance across both retrieval directions and demonstrated superior generalization to temporally unseen enzymes and structurally dissimilar reactions.
Ablation Studies: Removing the Dynamic Gating Network resulted in significant performance drops, particularly when using noisy AI-generated text, confirming the necessity of the gating mechanism. Similarly, the Structure-Shared Feature Projector proved essential for handling distribution shifts compared to standard MLP projections.

Significance

The paper claims that TIGER addresses the fundamental challenges of directional asymmetry and distributional sensitivity in enzyme-reaction retrieval. By integrating knowledge-rich textual descriptions with sequence data and employing a reliability-aware gating mechanism, the framework establishes a more robust and semantically aligned joint embedding space. The results suggest that text-informed paradigms offer a viable path forward for advancing biochemical retrieval tasks, moving beyond the limitations of purely sequence- or structure-based models. The authors note that while the current text modeling is coarse-grained, the approach highlights the potential of leveraging natural language supervision for functional annotation and pathway reconstruction.

TIGER: Text-Informed Generalized Enzyme-Reaction Retrieval