Coupling Local Context and Global Semantic Prototypes via a Hierarchical Architecture for Rhetorical Roles Labeling

Imagine you are reading a very long, complex legal document, like a Supreme Court opinion. It's not just a random list of sentences; it's a carefully constructed story with a specific structure. Some sentences are setting the scene, some are arguing a point, some are quoting old laws, and others are delivering the final verdict.

Rhetorical Role Labeling (RRL) is the job of a computer trying to figure out: "What is this specific sentence actually doing in the story?"

For a long time, computers were good at looking at a sentence and its immediate neighbors (like reading a paragraph) to guess its role. But they often got confused because they missed the "big picture." They didn't realize that a sentence saying "We quote the law" usually appears in a specific part of the document, or that "The Court's reasoning" follows a very specific global pattern across thousands of documents.

This paper introduces a clever new way to teach computers to see both the local details and the global pattern at the same time.

Here is the breakdown of their solution, using simple analogies:

1. The Problem: The "Myopic" Reader

Imagine a student reading a legal document. They are so focused on the sentence right in front of them that they forget what the whole chapter is about.

The Old Way: The computer looks at a sentence and asks, "What words are around you?" It's like trying to identify a character in a movie by only looking at their face, without knowing the plot.
The Issue: This works okay for easy sentences, but for tricky ones (like a sentence that looks like an argument but is actually just recalling a fact), the computer gets lost. It lacks the "corpus-level" knowledge—the big picture of how legal documents usually flow.

2. The Solution: The "Mental Anchors" (Prototypes)

The authors propose using Semantic Prototypes. Think of these as mental anchors or archetypes.

Instead of just memorizing individual sentences, the computer learns a "perfect example" (a prototype) for each type of role.

The "Recalling" Prototype: A mental summary of what a "Recalling" sentence usually looks like across the entire library of legal documents.
The "Verdict" Prototype: A mental summary of what a "Verdict" sentence looks like.

The computer then asks: "Does this new sentence look more like the 'Recalling' anchor or the 'Verdict' anchor?" This helps it make better guesses even when the local context is confusing.

3. The Two New Tools

The paper introduces two specific ways to use these anchors:

A. Prototype-Based Regularization (PBR) – "The Compass"

How it works: Imagine the computer's internal brain (its "latent space") is a messy room where all sentence meanings are scattered. PBR acts like a compass.
The Analogy: It gently pulls sentences that mean the same thing closer to their specific "anchor" (prototype) and pushes different meanings apart. It doesn't change the computer's brain structure; it just adds a rule: "Hey, if you think this is a 'Recalling' sentence, make sure it feels like other 'Recalling' sentences we've seen before."
Result: The computer organizes its thoughts better, reducing confusion between similar roles.

B. Prototype-Conditioned Modulation (PCM) – "The GPS Injection"

How it works: This is more aggressive. Instead of just a compass, it's like injecting a GPS signal directly into the computer's brain while it's reading.
The Analogy: As the computer reads a sentence, it pauses and asks, "Based on the whole document I've seen so far, what is the 'global vibe' right now?" It then mixes that global vibe directly into its understanding of the current sentence.
Result: The computer gets a "boost" of context. It knows, "I am in the 'Analysis' section of the document, so this sentence is likely an argument, not a fact."

4. The New Dataset: SCOTUS-LAW

To prove this works, the authors couldn't just use old data. They built a brand new library called SCOTUS-LAW.

What is it? They took 180 U.S. Supreme Court opinions and manually labeled every single sentence with three layers of detail:
1. Category: The big section (e.g., "Analysis," "Resolution").
2. Function: The specific job (e.g., "Quoting," "Recalling").
3. Step: The tiny nuance (e.g., "Recalling a specific court case").
Why it matters: It's like upgrading from a map with just "Cities" to a map with "Streets, House Numbers, and Room Numbers." This high level of detail makes it a perfect test for their new tools.

5. The Results: Small Model, Big Brain

The authors tested their method on legal, medical, and scientific texts.

The Win: Their method consistently beat the previous best models, especially on the "hard" sentences that are rare or confusing.
The Efficiency Surprise: They compared their method to massive AI models (like LLMs) that require huge computers to run.
- The Analogy: The massive LLMs are like a giant supercomputer trying to solve a puzzle. Their method is like a smart, lightweight toolkit.
- The Outcome: Their "lightweight toolkit" actually performed better than the giant supercomputer in many cases, while using a fraction of the energy and money.

Summary

This paper teaches computers to read legal documents not just by looking at the words on the page, but by understanding the global rhythm of the document. By creating "mental anchors" for different types of sentences, the computer becomes a much better lawyer's assistant, able to distinguish between a subtle argument and a simple fact recall, all while running on a standard computer rather than a supercomputer.

Here is a detailed technical summary of the paper "Coupling Local Context and Global Semantic Prototypes via a Hierarchical Architecture for Rhetorical Roles Labeling."

1. Problem Definition

Rhetorical Role Labeling (RRL) is the task of classifying each sentence in a document according to its functional role (e.g., "Analysis," "Holding," "Quoting") within the broader discourse. This is critical for domains like law and medicine to enable downstream tasks such as information retrieval and summarization.

The Core Challenge:

Local vs. Global: Existing state-of-the-art (SOTA) approaches rely on hierarchical architectures (e.g., BERT + Bi-LSTM + CRF) that effectively capture local dependencies between sentences within a document. However, they fail to model global semantic regularities shared across the entire corpus.
Ambiguity: This limitation leads to confusion between semantically similar roles (e.g., distinguishing between "Recalling" a source and "Stating the Court's reasoning" when both involve citations).
Resource Scarcity: There is a lack of high-quality, multi-granularity datasets for U.S. legal documents, particularly for Supreme Court opinions.

2. Methodology

The authors propose a framework that integrates local context (via a hierarchical backbone) with global semantic prototypes (corpus-level representations). They introduce two distinct methods to inject these global signals:

A. Backbone Architecture

The base model is a Hierarchical Sequential Labeling Network (HSLN):

Token Level: Sentences are encoded using BERT.
Sentence Level: Token embeddings are processed by a Bi-LSTM and attention pooling to create fixed-size sentence vectors.
Document Level: A second Bi-LSTM contextualizes sentence vectors with surrounding sentences.
Prediction: A Conditional Random Field (CRF) layer predicts the sequence of labels.

B. Proposed Methods

1. Prototype-Based Regularization (PBR)

Concept: Learns "soft" prototypes (trainable vectors) jointly with the model to structure the latent space.
Mechanism: It adds an auxiliary loss function to the standard cross-entropy loss:
- Proximity Loss ( $L_{prox}$ ): Pulls sentence embeddings toward their nearest prototype.
- Diversity Loss ( $L_{div}$ ): Pushes prototypes apart to prevent redundancy.
Goal: Regularizes the embedding space so that sentences with similar rhetorical roles cluster around specific semantic anchors without altering the backbone architecture.

2. Prototype-Conditioned Modulation (PCM)

Concept: Dynamically injects precomputed, corpus-level prototypes into the encoding process.
Mechanism:
- Prototype Extraction: Prototypes are computed by averaging the embeddings of all sentences belonging to a specific role in the training corpus.
- Sampling Strategies: The authors evaluate deriving prototypes from the full corpus, random subsets, or semantically clustered subsets (Supervised Sampling).
- Injection: During training and inference, the closest prototype is identified for a sentence and injected into the encoder via modulation mechanisms (e.g., Linear Fusion, Conditional Layer Normalization).
Goal: Provides explicit global context signals to the encoder to resolve ambiguities that local context cannot.

3. Key Contributions

A. New Dataset: SCOTUS-LAW

The authors introduce SCOTUS-LAW, the first manually annotated corpus of U.S. Supreme Court opinions for RRL.

Scale: 180 decisions, ~26,328 sentences.
Granularity: Annotated at three levels:
1. Discursive Category: (e.g., Setting the scene, Analysis, Resolution).
2. Rhetorical Function: (e.g., Quoting, Recalling, Stating reasoning).
3. Step/Attributes: Optional attributes for Type, Author, and Target.
Annotation Quality: Achieved high inter-annotator agreement (Fleiss' Kappa $\approx$ 0.72) after expert calibration.

B. Novel Architectural Approaches

First application of prototype-based learning specifically for RRL within a hierarchical framework.
Demonstration that global semantic anchors can be effectively coupled with local sequence modeling.

C. Comprehensive Evaluation

Evaluated on 7 benchmarks across Legal (U.S. and India), Medical (PubMed), and Scientific (CS-Abstracts) domains.
Compared against SOTA baselines and recent Large Language Models (LLMs) fine-tuned with QLoRA.

4. Experimental Results

Performance Gains

Consistent Improvement: Both PBR and PCM outperformed the hierarchical baseline across all domains.
Low-Frequency Roles: The methods showed significant gains (up to ~4 Macro-F1 points) on minority and ambiguous roles (e.g., "Stating the Court's reasoning" improved by +3.35%).
Fine-Grained Tasks: The largest gains were observed on the most granular "Step" annotation level, where local context is often insufficient.
Cross-Domain Transfer: PBR transferred robustly to medical and scientific abstracts, while PCM showed high potential but was limited by the quality of prototype retrieval in domains with low structural variation.

Comparison with LLMs

Efficiency vs. Accuracy: The authors fine-tuned four open-source LLMs (DeepSeek-70B, Llama3-8B, Mistral-7B, Qwen3-8B) using QLoRA.
Findings: While fine-tuned LLMs improved over previous in-context prompting results, the proposed prototype-based methods (using only 110M parameters) surpassed Mistral-7B (7B parameters) in accuracy while using **70x fewer parameters**. This highlights a superior accuracy-efficiency trade-off for resource-constrained settings.

Qualitative & Expert Analysis

Ambiguity Resolution: Expert evaluation confirmed that PCM significantly reduced errors in confusing pairs like "Recalling" vs. "Stating the Court's reasoning" (19.75% error reduction).
Latent Space: t-SNE visualizations showed that prototypes successfully separated overlapping clusters in the embedding space that the baseline failed to distinguish.

5. Significance and Conclusion

Theoretical Insight: The paper demonstrates that RRL is not just a local sequence labeling problem but requires global semantic awareness. Prototypes act as "semantic anchors" that stabilize representations across documents.
Practical Impact:
- Provides a high-quality resource (SCOTUS-LAW) for legal NLP research.
- Offers a computationally efficient alternative to massive LLMs for specialized legal tasks, making advanced RRL accessible without massive compute resources.
Future Directions: The authors suggest extending semantic prototyping to multilingual and cross-domain RRL, where generalization is even more challenging.

In summary, this work bridges the gap between local context modeling and global corpus statistics, proving that semantic prototypes are a powerful, efficient mechanism for resolving rhetorical ambiguities in structured legal and scientific texts.