Coupling Local Context and Global Semantic Prototypes via a Hierarchical Architecture for Rhetorical Roles Labeling

This paper addresses the limitations of hierarchical models in Rhetorical Role Labeling by proposing prototype-based methods that integrate local context with global semantic representations, introducing the new SCOTUS-Law dataset, and demonstrating consistent performance improvements across legal, medical, and scientific domains.

Anas Belfathi, Nicolas Hernandez, Laura Monceaux, Warren Bonnard, Mary Catherine Lavissiere, Christine Jacquin, Richard Dufour

Published 2026-03-05
📖 5 min read🧠 Deep dive

Imagine you are reading a very long, complex legal document, like a Supreme Court opinion. It's not just a random list of sentences; it's a carefully constructed story with a specific structure. Some sentences are setting the scene, some are arguing a point, some are quoting old laws, and others are delivering the final verdict.

Rhetorical Role Labeling (RRL) is the job of a computer trying to figure out: "What is this specific sentence actually doing in the story?"

For a long time, computers were good at looking at a sentence and its immediate neighbors (like reading a paragraph) to guess its role. But they often got confused because they missed the "big picture." They didn't realize that a sentence saying "We quote the law" usually appears in a specific part of the document, or that "The Court's reasoning" follows a very specific global pattern across thousands of documents.

This paper introduces a clever new way to teach computers to see both the local details and the global pattern at the same time.

Here is the breakdown of their solution, using simple analogies:

1. The Problem: The "Myopic" Reader

Imagine a student reading a legal document. They are so focused on the sentence right in front of them that they forget what the whole chapter is about.

  • The Old Way: The computer looks at a sentence and asks, "What words are around you?" It's like trying to identify a character in a movie by only looking at their face, without knowing the plot.
  • The Issue: This works okay for easy sentences, but for tricky ones (like a sentence that looks like an argument but is actually just recalling a fact), the computer gets lost. It lacks the "corpus-level" knowledge—the big picture of how legal documents usually flow.

2. The Solution: The "Mental Anchors" (Prototypes)

The authors propose using Semantic Prototypes. Think of these as mental anchors or archetypes.

Instead of just memorizing individual sentences, the computer learns a "perfect example" (a prototype) for each type of role.

  • The "Recalling" Prototype: A mental summary of what a "Recalling" sentence usually looks like across the entire library of legal documents.
  • The "Verdict" Prototype: A mental summary of what a "Verdict" sentence looks like.

The computer then asks: "Does this new sentence look more like the 'Recalling' anchor or the 'Verdict' anchor?" This helps it make better guesses even when the local context is confusing.

3. The Two New Tools

The paper introduces two specific ways to use these anchors:

A. Prototype-Based Regularization (PBR) – "The Compass"

  • How it works: Imagine the computer's internal brain (its "latent space") is a messy room where all sentence meanings are scattered. PBR acts like a compass.
  • The Analogy: It gently pulls sentences that mean the same thing closer to their specific "anchor" (prototype) and pushes different meanings apart. It doesn't change the computer's brain structure; it just adds a rule: "Hey, if you think this is a 'Recalling' sentence, make sure it feels like other 'Recalling' sentences we've seen before."
  • Result: The computer organizes its thoughts better, reducing confusion between similar roles.

B. Prototype-Conditioned Modulation (PCM) – "The GPS Injection"

  • How it works: This is more aggressive. Instead of just a compass, it's like injecting a GPS signal directly into the computer's brain while it's reading.
  • The Analogy: As the computer reads a sentence, it pauses and asks, "Based on the whole document I've seen so far, what is the 'global vibe' right now?" It then mixes that global vibe directly into its understanding of the current sentence.
  • Result: The computer gets a "boost" of context. It knows, "I am in the 'Analysis' section of the document, so this sentence is likely an argument, not a fact."

4. The New Dataset: SCOTUS-LAW

To prove this works, the authors couldn't just use old data. They built a brand new library called SCOTUS-LAW.

  • What is it? They took 180 U.S. Supreme Court opinions and manually labeled every single sentence with three layers of detail:
    1. Category: The big section (e.g., "Analysis," "Resolution").
    2. Function: The specific job (e.g., "Quoting," "Recalling").
    3. Step: The tiny nuance (e.g., "Recalling a specific court case").
  • Why it matters: It's like upgrading from a map with just "Cities" to a map with "Streets, House Numbers, and Room Numbers." This high level of detail makes it a perfect test for their new tools.

5. The Results: Small Model, Big Brain

The authors tested their method on legal, medical, and scientific texts.

  • The Win: Their method consistently beat the previous best models, especially on the "hard" sentences that are rare or confusing.
  • The Efficiency Surprise: They compared their method to massive AI models (like LLMs) that require huge computers to run.
    • The Analogy: The massive LLMs are like a giant supercomputer trying to solve a puzzle. Their method is like a smart, lightweight toolkit.
    • The Outcome: Their "lightweight toolkit" actually performed better than the giant supercomputer in many cases, while using a fraction of the energy and money.

Summary

This paper teaches computers to read legal documents not just by looking at the words on the page, but by understanding the global rhythm of the document. By creating "mental anchors" for different types of sentences, the computer becomes a much better lawyer's assistant, able to distinguish between a subtle argument and a simple fact recall, all while running on a standard computer rather than a supercomputer.