Legal interpretation and AI: from expert systems to argumentation and LLMs

The Big Picture: Who Decides What the Law Means?

Imagine the law is a giant, ancient rulebook written in a language that is sometimes vague, sometimes contradictory, and often open to debate. The central question of this paper is: How can we teach computers to figure out what these rules actually mean?

The authors, Václav Janeček and Giovanni Sartor, take us on a journey through three different eras of trying to solve this puzzle, using the classic example of a park sign that says: "No vehicles in the park."

Era 1: The "Rule-Maker" (Expert Systems)

The Analogy: The Strict Librarian.

In the 1970s and 90s, researchers tried to build AI like a super-strict librarian. They believed that if humans could write down every single rule perfectly, the computer could just follow them like a recipe.

How it worked: A human expert (the librarian) had to decide beforehand exactly what a "vehicle" is. They would write a rule in the computer: "If it has wheels and an engine, it's a vehicle. If it's a bicycle, it's a vehicle."
The Problem: This is like trying to pack a suitcase for a trip to every possible climate in the world before you leave. What if a child rides a tiny tricycle? What if it's a toy car?
The Flaw: The computer couldn't handle the "gray areas." If the human expert made a mistake in the rules (e.g., deciding all bikes are vehicles), the computer would blindly ban a child's bike, even if the park guard would have let it slide. The computer had no common sense; it just followed the manual.

Era 2: The "Debate Club" (Argumentation)

The Analogy: The Courtroom Drama.

Researchers realized that law isn't just about following a list; it's about arguing. So, they changed the AI from a rule-follower to a debate moderator.

How it works: Instead of hard-coding the answer, the AI builds two teams of arguments.
- Team A (The Guard): "Bikes are vehicles! The sign says 'No vehicles,' so the kid can't ride." (Argument based on Ordinary Language).
- Team B (The Kid): "But the point of the rule is to stop loud, dangerous motorbikes. A quiet kid's bike doesn't hurt anyone. So, 'vehicle' shouldn't include this." (Argument based on Purpose/Teleology).
The Magic: The AI doesn't just pick one; it weighs the arguments. It asks: "Which argument is stronger?" In this case, the "Purpose" argument might win, allowing the kid to ride.
The Result: This approach mimics how real lawyers think. It acknowledges that the law is a conversation, not a computer program.

Era 3: The "Super-Reader" (Large Language Models / LLMs)

The Analogy: The Over-Confident Intern.

Today, we have Generative AI (like ChatGPT). These aren't rule-followers or debate moderators; they are pattern matchers. They have read almost every book, law, and article on the internet.

How it works: You ask the AI, "Can a kid ride a bike in the park?" The AI doesn't "know" the answer. Instead, it looks at billions of similar sentences it has read before and predicts what a human would say. It's incredibly good at sounding smart.
The Good News: It can explain the rule in plain English, list different ways to interpret it, and even find the "purpose" of the law just like a lawyer. It's like having a brilliant intern who has read the entire library in 5 seconds.
The Bad News (The "Hallucination"): Because the AI is guessing based on patterns, it can lie confidently. It might invent a fake law or cite a court case that doesn't exist. It has no understanding of truth, only of what looks like truth.
- The Danger: If you let this "Intern" make the final decision, it might fine the kid because it "thinks" bikes are vehicles, even if the real human judge would say no. It lacks the moral compass and the ability to understand the spirit of the law.

The Verdict: What Should We Do?

The authors conclude with a very important warning: Don't let the AI be the Judge.

AI is a Tool, Not a Boss: Think of the AI as a "research assistant." It can find the arguments, summarize the laws, and suggest ideas.
The Human is the Captain: A human lawyer or judge must look at what the AI says, check if it's true, and make the final call. The human brings the "soul" of the law—the ability to understand fairness, context, and human values.
The Future: The best path forward is to combine the strengths. Use the AI's speed to find information and the human's wisdom to weigh the arguments.

Summary in One Sentence

We moved from trying to teach computers rigid rules (which failed), to teaching them how to argue (which worked better), to giving them a super-memory (which is powerful but dangerous), and the lesson is: Let the computer do the research, but let the human make the judgment.

Based on the paper "Legal interpretation and AI: from expert systems to argumentation and LLMs" by Václav Janeček and Giovanni Sartor, here is a detailed technical summary covering the problem, methodology, key contributions, results, and significance.

1. Problem Statement

The central problem addressed is how Artificial Intelligence (AI) can effectively model, support, or automate legal interpretation—the process of determining the meaning of legal texts (statutes, contracts, regulations) to apply them to specific scenarios.

The paper identifies a fundamental tension:

Ambiguity and Vagueness: Legal texts often contain syntactic ambiguities (e.g., "vehicles in the park") or semantic vagueness that require human judgment to resolve.
Evolution of AI: As AI technology has evolved from symbolic logic to statistical machine learning, the approach to handling this ambiguity has shifted.
The Core Challenge: How to transition from rigid, rule-based systems that struggle with conflicting interpretations to modern Generative AI (LLMs) that can generate plausible arguments but lack genuine understanding, truth-tracking, and normative reasoning.

2. Methodology and Evolution of Approaches

The authors analyze the problem through a historical and methodological lens, categorizing AI & Law research into three distinct eras:

A. Expert Systems (Symbolic AI, 1970s–1990s)

Approach: Relied on knowledge engineering. Human experts and engineers manually encoded legal rules into formal logical structures (rules, concepts, deductive logic).
Handling Interpretation: Interpretation was treated as a pre-processing step. Ambiguities had to be resolved before the system was deployed.
- Syntactic Ambiguity: Resolved by the drafter/engineer using logical connectives (e.g., clarifying "A and B" vs. "A or B").
- Semantic Ambiguity: Resolved by adding specific interpretive rules to the knowledge base (e.g., defining "vehicle" to explicitly include or exclude bicycles).
Limitation: This shifted the "interpretive power" from the applying officer to the knowledge engineer. It was brittle; if a rule was too broad (e.g., "all bikes are vehicles"), the system could not easily make exceptions for specific contexts (e.g., a child's bike) without complex exception-handling mechanisms.

B. Argumentation Frameworks

Approach: Moved away from requiring a single, logically consistent knowledge base. Instead, it models dialectical interactions between competing arguments.
Handling Interpretation: Interpretation is modeled as a conflict between alternative meaning-ascriptions based on interpretive canons (e.g., ordinary meaning, purposiveness, precedent).
- Mechanism: Arguments are constructed using defeasible conditionals. For example, an argument based on "Ordinary Language" (bicycles are vehicles) can be attacked by an argument based on "Teleology/Purpose" (the rule aims to stop noise, so bicycles are excluded).
- Resolution: Uses formal argumentation semantics (e.g., Dung's frameworks) to determine which arguments are "IN" (accepted) and which are "OUT" (defeated) based on priority rules (e.g., purpose overrides ordinary meaning).
Advantage: Allows the system to handle conflicting interpretations dynamically without pre-resolving every ambiguity.

C. Machine Learning and Large Language Models (LLMs)

Approach: Utilizes Generative AI and LLMs trained on vast corpora of text. These are statistical, linguistic models rather than rule-based or logic-based systems.
Handling Interpretation: LLMs generate interpretive suggestions by predicting statistically likely continuations of text based on patterns in training data.
- Capabilities: They can segment provisions, generate paraphrases, list ambiguities, and articulate competing arguments tied to canons.
- Techniques: The paper discusses Retrieval Augmented Generation (RAG) to ground LLMs in specific legal databases and Prompt Engineering to guide the model toward legal reasoning patterns.
Limitation: LLMs lack "inner understanding," truth-tracking, and normative reasoning. They are prone to hallucinations (fabricating facts/citations) and inconsistency.

3. Key Contributions

The paper makes several significant theoretical and practical contributions:

Taxonomy of AI Approaches to Interpretation: It clearly delineates the shift from knowledge transfer (Expert Systems) $\to$ dialectical modeling (Argumentation) $\to$ statistical generation (LLMs).
Concept of "Generative Interpretation": It introduces and analyzes the concept (coined by Hoffman and Arbel) where LLMs are used to estimate "ordinary meaning" in context by analyzing vast amounts of text and extrinsic evidence, offering a new empirical turn in jurisprudence.
Critical Analysis of LLM Limitations: The authors provide a rigorous critique of LLMs in law, highlighting:
- Epistemic Gap: LLMs process legal data as facts, not as normative rules with validity conditions.
- Reliability Issues: High rates of hallucination and inconsistency make them unsuitable as authoritative oracles.
- The Verification-Value Paradox: The cost of verifying LLM outputs may outweigh the efficiency benefits, especially in high-stakes legal decisions.
Proposed Role for AI: The paper argues that AI should act as a support tool (a "companion") for skilled lawyers, not an autonomous decision-maker. It suggests a hybrid workflow where LLMs synthesize debates and suggest arguments, but humans retain final responsibility and normative justification.

4. Results and Findings

Expert Systems: Successfully demonstrated that formalizing legal texts requires making interpretive choices ex-ante, which can lead to rigidity and a loss of contextual sensitivity.
Argumentation: Proven effective in modeling the structure of legal reasoning, showing how conflicting canons (e.g., text vs. purpose) can be formally weighed and resolved.
LLMs:
- Performance: LLMs can produce outputs that semantically resemble competent human legal analysis (e.g., discussing the "No vehicles in the park" rule with nuance regarding children's bikes).
- Failure Modes: Empirical studies cited show LLMs fail at extracting legal principles, detecting unfair clauses, and following structured reasoning (like IRAC) reliably. They often hallucinate authorities.
- Data Dependency: The quality of LLM interpretation is heavily dependent on the quality of the prompt and the retrieval of relevant external data (RAG). Without access to specific, up-to-date legal databases, they may "hallucinate" non-existent authorities.

5. Significance

The paper holds significant value for legal scholars, AI researchers, and legal practitioners:

Theoretical: It bridges the gap between traditional legal theory (hermeneutics, canons of construction) and modern AI capabilities, arguing that while LLMs are powerful linguistic tools, they do not possess the cognitive skills required for genuine legal interpretation.
Practical: It offers a roadmap for the responsible deployment of AI in law. It warns against "automation bias" (blindly trusting AI) and advocates for a "human-in-the-loop" approach where AI augments, rather than replaces, human judgment.
Future Research Directions: The authors identify critical areas for future work:
- Integrating symbolic logic/argumentation with LLMs (Neuro-symbolic AI) to combine the scalability of LLMs with the rigor of logic.
- Developing better benchmarks for legal reasoning that go beyond quantitative metrics to include qualitative assessments of argument quality.
- Designing workflows that mitigate hallucinations through automated citation verification and robust RAG architectures.

In conclusion, the paper posits that while AI has evolved from rigid rule-encoders to sophisticated text generators, the normative and contextual nature of legal interpretation remains a uniquely human domain. AI's role is to expand the set of arguments and data available to the human interpreter, not to replace the interpreter's responsibility for the final decision.