A Benchmark for Gap and Overlap Analysis as a Test of KG Task Readiness

This paper introduces an executable and auditable benchmark for evaluating knowledge graph readiness in policy-like document analysis by aligning natural language contracts with a formal ontology to systematically compare text-only LLMs against ontology-driven pipelines for gap and overlap analysis.

Original authors: Maruf Ahmed Mridul, Rohit Kapa, Oshani Seneviratne

Published 2026-04-14
📖 5 min read🧠 Deep dive

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are shopping for life insurance. You have ten different policies in front of you, written in dense, confusing legal language. You have a specific situation in mind: "What happens if I die in a car accident while intoxicated, or if I commit suicide 13 months after buying the policy?"

You need to know two things:

  1. Overlap: Which of these ten policies will actually pay out my family?
  2. Gap: Which policies will say "No, we don't cover that," and why?

Doing this manually is a nightmare. Lawyers have to read every word, cross-reference every clause, and hope they didn't miss a tiny detail.

This paper introduces a new way to solve that problem using Knowledge Graphs (KGs) and a "test drive" called a Benchmark. Here is the breakdown in simple terms.

1. The Problem: The "Black Box" of Legal Text

The authors argue that just having a lot of data (like a massive library of insurance contracts) isn't enough. You need a way to prove that your computer system understands the rules exactly the way a human expert does.

Usually, we test AI by asking it to read a contract and guess the answer. But AI (like Large Language Models or LLMs) is like a very smart student who reads the book but sometimes makes up rules based on what sounds right, rather than what is actually written. They might say, "Oh, suicide is usually bad, so I'll say this policy denies it," even if the specific policy says it covers it after 12 months.

2. The Solution: Building a "Digital Twin" of the Contracts

Instead of just feeding the AI the raw text, the authors built a Knowledge Graph. Think of this as turning the messy, 50-page PDF of an insurance contract into a clean, organized LEGO set.

  • The TBox (The Blueprint): This is the rulebook. It defines what a "Policy," "Suicide Clause," or "Grace Period" actually means in the world of insurance. It's the instruction manual for the LEGO set.
  • The ABox (The Built Model): This is the actual LEGO structure built for each of the 10 specific contracts. Every single fact (e.g., "Contract C1 has a 24-month suicide waiting period") is a specific LEGO brick snapped into place.
  • The Traceability: Crucially, every single LEGO brick has a tiny tag attached to it that says, "I came from Page 4, Paragraph 2 of Contract C1." If the computer makes a decision, you can instantly look at the tag and see the exact source text.

3. The Test: The "Scenario Challenge"

To see if this LEGO system works, the authors created 58 "What-If" scenarios (Competency Questions).

  • Example Scenario: "The insured dies by suicide 13 months after the policy started."

They ran this scenario against their LEGO model using a precise query language (SPARQL). Because the LEGO bricks are snapped together perfectly according to the blueprint, the computer can instantly say:

  • "Contracts 1-5 and 7-10 say COVERED (because their waiting period is 12 months)."
  • "Contract 6 says DENIED (because its waiting period is 24 months)."

And because of the tags, it can show you the exact sentence in the contract that proves it.

4. The Showdown: LEGO vs. The "Smart Student"

The authors then asked three top-tier AI models (ChatGPT, Gemini, Claude) to read the raw text and answer the same 58 scenarios.

The Results:

  • The LEGO System (Ontology): Was 100% consistent. It never got tired, never guessed, and always pointed to the exact evidence.
  • The "Smart Student" (LLMs): They were okay at simple questions but started failing on complex ones.
    • The "Missing Clause" Trap: If a contract didn't mention a specific rule (like "no drinking"), the AI often assumed the claim was denied. The LEGO system correctly said, "This scenario doesn't apply to this contract," because the contract simply didn't have that rule.
    • The "Complex Structure" Trap: When the scenario involved a complex joint policy (two people on one plan), the AI got confused and gave different answers for different contracts, even when the logic was the same.

The Metaphor:
Imagine you are trying to find a specific ingredient in a kitchen.

  • The LLM is like a chef who looks at the pantry, smells the air, and guesses, "I bet there's no salt here," because they don't see a salt shaker. They might be right, or they might be wrong.
  • The Knowledge Graph is like a robot that has a digital map of every single jar on every shelf. It checks the map, sees the jar is missing, and says, "Confirmed: No salt." If you ask why, it points to the map coordinates.

5. Why This Matters

This paper proves that for high-stakes jobs (like insurance, law, or healthcare), you can't just rely on AI that "guesses" based on patterns. You need a system that models the rules explicitly.

The "Benchmark" they created is like a standardized driving test for these AI systems. It shows that:

  1. Structure wins: Turning text into a structured map (the LEGO set) makes the AI much more reliable.
  2. Evidence is key: You can't just get an answer; you need to see the "receipt" (the source text) proving why that answer was chosen.
  3. It's reusable: While they used insurance, this same "Blueprint + LEGO + Test Drive" method can be used for healthcare laws, bank regulations, or any complex rulebook.

In short: The authors built a "truth machine" for insurance contracts. They showed that while AI is great at chatting, it needs a structured, rule-based backbone to be trusted with serious decisions where getting the answer wrong costs people their money or lives.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →