KG-Orchestra: An Open-Source Multi-Agent Framework for… — Plain-Language Explanation

⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are trying to solve a massive, complex mystery, but your clues are scattered across thousands of dusty, unread books in a giant library. You have a small, hand-drawn map (a "seed" map) showing a few connections, but it's missing huge chunks of the story.

This is exactly the problem scientists face with Biomedical Knowledge Graphs. These are digital maps that show how drugs, genes, diseases, and proteins interact. Currently, making these maps is either:

Too slow: Humans read every book and draw the connections by hand (accurate but impossible to scale).
Too messy: Computers read everything quickly but often make up connections or miss the deep "why" behind them (fast but unreliable).

Enter KG-Orchestra. Think of it not as a single robot, but as a highly organized, multi-person detective team working together to fill in the missing pieces of your map.

The Detective Team (The Multi-Agent System)

Instead of one super-intelligent AI trying to do everything at once (which often leads to mistakes), KG-Orchestra uses a "team" of specialized AI agents, each with a specific job, like a symphony orchestra where every musician plays a different instrument to create a perfect song.

Here is how the team works:

The Librarian (Retrieval Agent):
- The Job: You ask, "How does Drug A affect Disease B?" The Librarian doesn't just guess; they dive into a massive digital library of millions of scientific papers.
- The Trick: They don't just read word-for-word; they understand the context. They use a special "hybrid" search that combines looking for exact keywords (like a traditional index) with understanding the meaning of the sentences (like a human reader). This ensures they find the right paragraph, even if the author used different words.
The Architect (Path Builder):
- The Job: Once the Librarian finds the clues, the Architect tries to build a bridge between the start and the end.
- The Analogy: Imagine you have a puzzle piece for "Drug A" and one for "Disease B," but they don't touch. The Architect finds the missing middle pieces (like "Drug A stops Protein X, which causes Stress, which leads to Disease B") and connects them into a straight, logical line.
The Editor (Schema Aligner):
- The Job: Scientific names can be messy. One paper might call a protein "P53," another "TP53," and a third "Tumor Protein 53."
- The Analogy: The Editor is the strict librarian who says, "We only use the official name 'TP53' in our map." They make sure everyone speaks the same language so the map doesn't get cluttered with duplicates.
The Fact-Checker (Triplet Validator):
- The Job: This is the most important agent. Before any new connection is added to the map, the Fact-Checker reads the original scientific paper again.
- The Analogy: They ask, "Does the paper actually say this? Is the direction right? (Did A cause B, or did B cause A?)" If the evidence is weak, they reject the connection or flag it for a human to check later. This prevents the AI from "hallucinating" (making things up).

Why This Matters (The Results)

The researchers tested this team on two real-world mysteries:

Mystery 1: How does a depression drug (Nelivaptan) help with Alzheimer's?
Mystery 2: How do probiotics (good bacteria) talk to our brains?

The Results were impressive:

Growth: The system didn't just add a few dots; it expanded the maps by 140% to 180%, adding thousands of new, verified connections.
Accuracy: Even though it was working fast, the "Fact-Checker" ensured that 93% of the new connections were biologically true.
Consistency: If you asked the team to solve the same mystery three times, they came back with almost the exact same map every time. This proves the system is reliable, not just lucky.

The Big Picture

Think of KG-Orchestra as a smart, automated research assistant that never sleeps. It takes a small, rough sketch of a biological relationship and turns it into a high-definition, evidence-backed roadmap.

For Drug Companies: It helps find new uses for old drugs (drug repurposing) by spotting hidden connections.
For Doctors: It helps understand why a treatment works, not just that it works.
For Scientists: It saves them years of reading papers, allowing them to focus on the big discoveries rather than the data entry.

In short, KG-Orchestra is the bridge between the overwhelming amount of scientific data we have and the clear, actionable knowledge we need to cure diseases. It turns a chaotic library into a perfectly organized, living map of human biology.

1. Problem Statement

Biomedical Knowledge Graphs (BKGs) are essential for integrating complex biological data, but their construction faces a critical scalability-quality trade-off:

Manual Curation: Offers high fidelity and mechanistic granularity but is unscalable and slow.
Automated Methods (Traditional NLP/LLMs): Offer scalability but often produce broad networks lacking causal depth, suffer from hallucinations, lack evidence traceability, and rely on static, potentially biased corpora.
The Gap: There is a lack of frameworks that can autonomously enrich existing "seed" graphs with high-resolution, causal, and evidence-backed knowledge from dynamic literature sources without sacrificing biological validity or provenance.

2. Methodology: KG-Orchestra Framework

KG-Orchestra is an open-source, multi-agent framework designed to autonomously enrich seed BKGs. It transforms sparse, manually curated graphs into dense, high-resolution discovery engines by acquiring, validating, and integrating evidence from scientific literature.

Core Architecture

The system employs a Multi-Agent System (MAS) where specialized agents collaborate to decompose complex extraction tasks, reducing hallucinations and improving reasoning through iterative cross-checking.

Key Workflow Stages:

Evidence Retrieval Pipeline:
- Query Formulation: Generates directional queries (e.g., "What connects Entity A to Entity B?").
- Chunking Strategy: Uses 512-token-length-bounded hybrid chunking rather than simple sentence splitting to preserve mechanistic context while remaining computationally tractable.
- Embedding & Retrieval: Employs a Hybrid Search strategy combining:
  - Dense Embeddings: Selected Nomic-V2-MOE (475M parameters) for semantic similarity.
  - Sparse Embeddings: SPLADE-v3 for lexical matching and rare entity recognition.
  - Fusion: Scores are fused in a Qdrant vector database to maximize relevance (NDCG@10).
- Fallback: If initial retrieval fails, a PubMed Web Fetcher queries PubMed directly for abstracts/full texts.
Multi-Agent Path Construction:
- Paragraph Evaluator: Labels retrieved paragraphs as "strongly relevant," "partially relevant," or "irrelevant."
- Path Builder: Composes directed, evidence-backed chains of triplets (Head $\to$ Relation $\to$ Tail) connecting the query source to the target.
- Schema Aligner: Maps extracted entities and relations to the seed graph's schema, extending types only when necessary to prevent ontology explosion.
- Entity Matcher: Resolves entities using exact matching or UMLS-based normalization (using CUIs) to avoid redundancy.
- Triplet Validation Team: A critical agent group that performs:
  - Evidence Augmentation: Retrieves additional supporting paragraphs.
  - Evaluation: Assesses biological validity, directionality, causality, and evidence compatibility.
  - Repair: Attempts to fix invalid triplets; flags persistent errors as "need-review" for human curation.
Backbone LLM Selection:
- The framework benchmarks various LLMs (DeepSeek-R1, Magistral, Gemma 3, Qwen 3).
- Qwen 3 (32B) was selected as the optimal backbone, offering the best balance of reasoning capability, biological validity, and evidence compatibility.

3. Key Contributions

Open-Source Framework: Provides a fully reproducible, open-source pipeline (GitHub available) for evidence-based KG enrichment.
Hybrid Retrieval & Chunking: Demonstrates that token-length-bounded hybrid chunking and dense+sparse retrieval significantly outperform sentence-level or dense-only approaches in biomedical contexts.
Multi-Agent Orchestration: Proves that dividing tasks among specialized agents (evaluator, builder, aligner, validator) significantly reduces hallucination rates and improves triplet integrity compared to monolithic models.
Evidence-Centric Design: Every generated triplet is accompanied by traceable provenance (DOI/PMID) and specific text excerpts, aligning with the Evidence and Conclusion Ontology (ECO).
Scalability: The framework is computationally flexible, deployable from single-laptop GPUs to high-performance clusters by adjusting model sizes (14B to 235B parameters).

4. Results & Evaluation

The framework was evaluated on two specialized use cases:

NADKG: Linking Nelivaptan (a drug) to Alzheimer's Disease.
ProPreSyn-GBA: Mapping probiotic/prebiotic interactions within the gut-brain axis.

Key Findings:

Retrieval Performance: Hybrid retrieval (Nomic-V2-MOE + SPLADE) improved NDCG@10 by up to 0.082 for smaller models, narrowing the gap with proprietary models like OpenAI's text-embedding-3-large.
LLM Performance: Qwen 3 (32B) achieved superior triplet-level quality (Biological Validity: 0.89, Evidence Compatibility: 0.75) compared to other models. Larger models (235B) offered better coverage but required significant resources.
Enrichment Scale:
- ProPreSyn-GBA: Expanded from 731 nodes to 1,768 nodes (+141%) and 1,362 to 3,835 relations (+182%).
- NADKG: Expanded from 1,685 nodes to 4,283 nodes.
Quality Metrics: Automated and manual evaluations confirmed high quality:
- Biological Validity: ~93% (ProPreSyn-GBA).
- Causality: ~77% of relations were mechanistic/causal.
- Reproducibility: Three independent runs showed 0.97–0.98 semantic similarity, indicating high stability despite LLM stochasticity.
Seed Size Impact: Larger seed graphs led to richer coverage (more nodes/relations) but did not compromise the accuracy of individual triplets, proving the system's robustness across data scales.

5. Significance

KG-Orchestra addresses the critical bottleneck in biomedical research where manual curation cannot keep pace with literature volume.

Mechanistic Discovery: It moves beyond simple correlation to uncover causal, multi-hop pathways (e.g., Nelivaptan $\to$ AVPR1B $\to$ Stress Response $\to$ Cortisol $\to$ Alzheimer's), enabling hypothesis generation for drug repurposing.
Auditability: By attaching specific evidence excerpts to every assertion, it enables transparent auditing and validation, a crucial requirement for clinical decision support.
Future Applications: The framework is positioned to automate ontology enrichment (e.g., Gene Ontology, GO-CAM), reducing the manual burden of encoding causal activity models.
Limitations & Future Work: Current limitations include reliance on public literature (missing paywalled data) and a ~10-15% "need-review" rate. Future iterations aim to integrate reasoning-aware retrieval (LOTUS) and Biological Expression Language (BEL) translation for higher precision.

In summary, KG-Orchestra represents a significant step forward in Evidence-Based AI, successfully bridging the gap between scalable automation and the high-fidelity, causal reasoning required for advanced biomedical discovery.

KG-Orchestra: An Open-Source Multi-Agent Framework for Evidence-Based Biomedical Knowledge Graphs Enrichment.