Agent-OM: Leveraging LLM Agents for Ontology Matching

Imagine you are trying to merge two massive, chaotic libraries. One library is organized by a strict librarian who uses Latin names for everything, and the other is run by a poet who uses colorful metaphors. Your goal is to find the exact same book in both libraries, even though they are called different things and shelved in different ways. This is Ontology Matching (OM): the task of connecting different "worlds of knowledge" so computers can talk to each other.

For years, we've tried to solve this with two main tools:

The Rulebook: A rigid system of pre-written rules (like a strict librarian). It's accurate but needs a human expert to write every single rule, which takes forever.
The Student: A machine learning model trained on thousands of examples. It's fast but needs a massive library of "training books" to learn, and if it hasn't seen a specific type of book before, it gets confused.

Enter Agent-OM: The Super-Smart Librarian Assistant

This paper introduces a new way to solve the problem using LLM Agents (think of them as AI assistants powered by Large Language Models like the one you are talking to right now). But instead of just asking the AI a simple question like "Are these two books the same?", the authors built a team of AI agents that work together like a professional research team.

Here is how Agent-OM works, explained through a simple analogy:

The Problem with Just Asking an AI

If you just ask a standard AI, "Is 'Program Committee Chair' the same as 'Chair_PC'?", it might guess correctly. But if you ask it about 10,000 pairs of terms, it will eventually get tired, make up facts (hallucinate), or run out of memory. It's like asking a genius student to memorize an entire encyclopedia and then quiz them on every single page without letting them look anything up.

The Agent-OM Solution: The "Siamese" Research Team

The authors created a system with two main AI agents (called Siamese Agents because they are twins that share a brain/memory). They don't just guess; they follow a strict workflow:

1. The Researcher (Retrieval Agent)

Imagine a detective who doesn't just look at the book title.

The Job: This agent goes out and gathers everything about a term. It looks at the title, reads the description, checks the context (is this about a conference or a medical procedure?), and even looks at the logical relationships (e.g., "This is a type of Chair").
The Trick: Instead of dumping all this info into the AI's brain (which would be too much), the Researcher files it neatly into a Hybrid Database. Think of this as a super-organized filing cabinet where the AI can instantly search for similar concepts using "vectors" (mathematical fingerprints of meaning).

2. The Judge (Matching Agent)

Once the Researcher has the files, the Judge steps in.

The Job: The Judge looks at the candidate matches found in the filing cabinet. It doesn't just say "Yes" or "No." It uses a Chain of Thought (a step-by-step reasoning process) to ask: "Does this make sense? Let me check the context again."
The Safety Net: To stop the AI from making up answers (hallucinations), the Judge has a Validator. It asks itself, "Are you sure? Let me double-check." If the AI is unsure, it rejects the match.
The Double-Check: The system checks the match in both directions (Source to Target, and Target to Source). It's like two people shaking hands; if Person A reaches out to Person B, but Person B doesn't reach back, the handshake isn't valid.

Why is this a Big Deal?

It's Cheaper and Smarter: Instead of training a new AI model from scratch (which costs a fortune and takes years), Agent-OM uses the existing "brain" of the AI but gives it tools (like a search engine and a notepad). It's like giving a smart person a calculator and a library card instead of forcing them to memorize math formulas.
It Handles the Weird Stuff: Standard systems struggle when there are very few examples to learn from (the "few-shot" problem). Agent-OM shines here because it can use its general knowledge to figure out that "Gold" and "Au" are the same, even if it's never seen them paired before.
It's Honest: By using a "Validator" step, the system catches its own mistakes. It's like a writer who writes a draft, then reads it aloud to catch typos before publishing.

The Results: How did it do?

The authors tested their system against the best existing tools on three different "exams" (datasets):

Simple Tasks: It performed just as well as the best long-standing systems (like a top student getting an A).
Complex Tasks: It significantly outperformed everyone else. When the task was hard and required deep reasoning, Agent-OM was the clear winner.

The Catch (Limitations)

The authors admit that while the system is great, it's not magic.

The "Easy" vs. "Hard" Paradox: Surprisingly, the AI was sometimes better at solving complex, weird problems than simple, boring ones. It's like a genius who can solve a physics equation but struggles to tie their shoelaces.
Hallucinations: The AI can still lie, but the "Validator" step catches most of the lies.
Cost: Using the smartest AI models (like GPT-4) costs money, though the system is designed to be efficient enough to be affordable.

The Bottom Line

Agent-OM is a new way of using AI to connect different knowledge bases. Instead of treating the AI as a crystal ball that guesses answers, it treats the AI as a worker with a toolkit: a researcher to find info, a judge to make decisions, and a checker to ensure accuracy. It's a step toward fully automated, intelligent systems that can understand and connect the world's data without needing a human to hold their hand every step of the way.

Here is a detailed technical summary of the paper "Agent-OM: Leveraging LLM Agents for Ontology Matching."

1. Problem Statement

Ontology Matching (OM) is the task of finding correspondences (alignments) between entities (classes and properties) in different ontologies to enable semantic interoperability. While traditional OM systems rely on knowledge-based expert systems (requiring extensive manual rules) or machine learning-based predictive systems (requiring large training datasets), both paradigms face limitations:

Expert Systems: Labor-intensive and difficult to scale as they require domain experts to define logic for every concept.
ML Systems: Require large amounts of high-quality training data, which is often unavailable for specific domains. Fine-tuning Large Language Models (LLMs) is computationally expensive and risks data leakage.
Direct LLM Application: Using LLMs directly for OM is challenging due to:
- Hallucinations: LLMs may generate syntactically correct but factually incorrect mappings.
- Context Limitations: LLMs lack late-breaking information and struggle with non-linguistic tasks like complex planning.
- Token Constraints: Comparing every pair of entities ( $N_s \times N_t$ ) via binary prompts exceeds token limits and is computationally inefficient.

The paper posits that LLM Agents—which combine planning, memory, and tool use—offer a novel solution to overcome these limitations without retraining the underlying models.

2. Methodology: The Agent-OM Framework

The authors propose Agent-OM, a generic framework that utilizes two "Siamese" autonomous agents (Retrieval Agent and Matching Agent) to orchestrate the matching process. The system treats the LLM as a central controller ("brain") rather than just a predictive model.

Core Architecture

The framework decomposes the classical OM process ( $R_{int} \Rightarrow R_{ext} \Rightarrow M_{sel} \Rightarrow M_{alg} \Rightarrow M_{ref}$ ) into an agent-driven workflow:
$Agent\_R(R_{int} \Rightarrow R_{ext} \Rightarrow R_{sto}) \Rightarrow Agent\_M(M_{sea} \Rightarrow M_{sel} \Rightarrow M_{alg} \Rightarrow M_{ref})$

Key Components:

Dual Agents (Siamese Structure):
- Retrieval Agent ( $Agent\_R$ ): Extracts entities from source and target ontologies, elicits metadata, and stores information in a hybrid database.
- Matching Agent ( $Agent\_M$ ): Searches the database, ranks candidates, validates mappings, and merges results.
- Note: Both agents share a memory module but operate independently with their own planning modules.
Planning Module (Chain-of-Thought):
- Decomposes complex matching tasks into subtasks (e.g., "retrieve metadata," "search vector DB").
- Uses CoT to define the order of tool invocation and allows for reflection/refinement.
Tool Use (Function Calling):
- Retrieval Tools:
  - Metadata Retriever: Extracts entity type (class/property) and category.
  - Syntactic/Lexical/Semantic Retrievers: Preprocesses text (tokenization/normalization), generates lexical descriptions via LLM prompts (using context), and verbalizes semantic triples (e.g., rdfs:subClassOf).
- Matching Tools:
  - Hybrid Database Search: Queries a relational DB for metadata and a vector DB for semantic similarity.
  - Matching Summariser: Uses Reciprocal Rank Fusion (RRF) to combine results from syntactic, lexical, and semantic matchers.
  - Matching Validator: A critical step to mitigate hallucinations. The LLM is asked a binary question ("Is Entity A equivalent to Entity B in this context?") to verify top candidates.
  - Matching Merger: Combines results from bidirectional searches ( $O_s \to O_t$ and $O_t \to O_s$ ) to ensure consistency.
Memory (Hybrid Storage):
- Short-term: Conversational dialogue history for context.
- Long-term: A hybrid database (PostgreSQL + pgvector).
  - Relational DB: Stores structured metadata (entity ID, type, source).
  - Vector DB: Stores embeddings of syntactic, lexical, and semantic content for similarity search (Cosine Similarity).
- This design reduces complexity from $O(N_s \times N_t)$ (naive pairwise comparison) to $O(N_s + N_t)$ (retrieval + search).
Naming Convention Handling:
- The system normalizes entity names. For code-based entities (e.g., MA_0000270), it retrieves the human-readable label/comment (e.g., "eyelid tarsus") before processing to prevent LLM confusion.

3. Key Contributions

Novel Paradigm: Introduces the first LLM-agent-based framework for OM, shifting from "Model-as-a-Service" (direct prompting) to "Agent-as-a-Controller" (planning + tools).
Mitigation of Hallucinations: Effectively addresses LLM unreliability through:
- RAG/ICL: Retrieving specific context rather than relying on parametric memory.
- Self-Validation: Explicitly asking the LLM to verify its own predictions.
- Bidirectional Merging: Requiring mappings to be consistent in both directions.
Scalable Architecture: Demonstrates a cost-effective approach using search-based matching (vector DB) rather than brute-force pairwise LLM prompting, making it viable for large ontologies.
Implementation: A proof-of-concept system supporting 10 different LLMs (including GPT-4o, Claude 3, Llama 3, Qwen, Gemma, and ChatGLM).

4. Experimental Results

The system was evaluated on three Ontology Alignment Evaluation Initiative (OAEI) tracks: Conference, Anatomy, and MSE (Materials Science).

Simple Tasks (Trivial Correspondences):
- Agent-OM achieved results very close to the long-standing best performance (e.g., ranking 2nd in OAEI 2022/2023 Anatomy Track for trivial matches).
- Performance is competitive with traditional systems like LogMap and AML.
Complex & Few-Shot Tasks (Non-Trivial Correspondences):
- Significant Improvement: Agent-OM significantly outperformed existing systems on complex tasks where training data is scarce or logic is intricate.
- In the Anatomy Track (Test Case 2), it surpassed 11 other systems, including the LLM-based OLaLa, ranking only behind the deep learning system Matcha (which had access to a massive training set).
- In the MSE Track, it achieved the best F1 scores in Test Cases 1 and 2, demonstrating strong capability in handling domain-specific abbreviations and subsumption relations.
Ablation Studies:
- Architecture: Agent-OM outperformed "LLM-Only" and "LLM-with-Context" baselines, proving the necessity of tool use and memory.
- LLMs: API-accessed models (GPT-4o, Claude-3-Sonnet) performed best, but open-source models (Gemma-2-9b) showed promising results.
- Hyperparameters: Optimal settings were found to be a similarity threshold of 0.90–0.95 and top@k of 3–5.
- Validation & Merging: These steps improved precision significantly with a minor trade-off in recall, confirming their role in reducing false positives.

5. Significance and Discussion

Moravec's Paradox in OM: The authors observe that Agent-OM excels at "hard" complex reasoning tasks (few-shot, complex logic) but is less dominant on "easy" trivial tasks compared to highly optimized traditional systems. This suggests a future need for hybrid approaches.
Efficiency: By using vector search and RAG, the system drastically reduces token consumption and cost compared to naive pairwise LLM prompting.
Autonomy: The framework moves OM toward higher autonomy, automating data preprocessing, analysis, and validation with minimal human intervention.
Limitations:
- Current reliance on hand-crafted prompts (though the system is adaptable).
- Hallucinations are mitigated but not eliminated; human-in-the-loop may still be needed for critical applications.
- Evaluation focused on TBox (classes/properties); ABox (individual instances) was excluded due to privacy concerns.

Conclusion: Agent-OM demonstrates that leveraging LLM agents with planning, memory, and tools is a viable and powerful paradigm for ontology matching. It offers a scalable, domain-agnostic solution that bridges the gap between the flexibility of LLMs and the precision required for semantic interoperability.