Utilizing Pre-trained and Large Language Models for 10-K Items Segmentation

Imagine you are a detective trying to solve a mystery, but the evidence is hidden inside a massive, chaotic library. This library contains thousands of books called 10-K reports. These are annual reports that public companies in the US must file with the government.

Inside these books, the information is organized into specific chapters, or "Items." For example, Item 7 is the "Management Discussion" (where bosses explain how the company did), and Item 1A is the "Risk Factors" (where they admit what could go wrong).

The Problem:
The library is a mess. Sometimes the chapters are labeled "Item 7," sometimes "Management's Discussion," and sometimes the formatting is weird. In the past, researchers tried to find these chapters using Rule-Based Methods. Think of this like using a rigid metal ruler to measure a squiggly, wiggly snake. If the snake changes shape even a little bit, the ruler breaks, and you miss the measurement. These old methods were brittle, broke easily when companies changed their formatting, and often made mistakes.

The Solution:
The authors of this paper built two new, super-smart "detectives" to find these chapters automatically. They used the latest Artificial Intelligence (AI) technology.

Detective #1: The Super-Reader (BERT4ItemSeg)

This detective is like a highly trained librarian who has read millions of books before.

How it works: It uses a technology called BERT (a pre-trained language model). Imagine this librarian has a super-power: they can read a single sentence and instantly understand the context of the whole paragraph.
The Trick: Since 10-K reports are huge (sometimes longer than a novel), the librarian can't read the whole book at once. So, the authors gave the librarian a special strategy: read the book line by line.
- The librarian reads one line, decides if it's the start of a new chapter, and passes that thought to a "manager" (a Bi-LSTM model).
- The manager looks at the flow of thoughts from the librarian and says, "Yes, that line is the start of the Risk Factors chapter!"
The Result: This detective is incredibly accurate (98%+ success rate). It's like having a librarian who never misses a page. However, you have to train this librarian yourself, and they need a powerful computer (a GPU) to work fast.

Detective #2: The Chatbot Genius (GPT4ItemSeg)

This detective is like a genius intern who has never seen a 10-K report before but is incredibly smart and can learn anything just by talking to you.

How it works: This uses a Large Language Model (LLM) like ChatGPT. Instead of training it for months, you just give it a few examples (like showing it three pages of a book and saying, "See? This is where the Risk Factors start. Now find it in this new book"). This is called "few-shot prompting."
The Problem: Geniuses sometimes "hallucinate." They might make up a story that sounds true but isn't. If you ask a chatbot to "extract the text," it might rewrite the text slightly, which is bad for legal documents where exact words matter.
The Trick: The authors invented a clever game called Line-ID Prompting.
- Instead of asking the chatbot to "write out the chapter," they put a number (an ID) next to every line in the document.
- They ask the chatbot: "Just tell me the numbers of the lines where the chapters start."
- Once the chatbot gives the numbers, the computer automatically grabs the exact text from those lines.
The Result: This detective is very good at adapting. If the government changes the rules tomorrow and adds a new chapter, you just tell the chatbot the new rule, and it figures it out instantly. It's flexible and doesn't need a super-computer, but it costs a little bit of money to use the chat service.

The Showdown

The authors tested both detectives on 3,737 real 10-K reports.

The Old Way (Rules): Got it right about 90% of the time.
The Chatbot (GPT4ItemSeg): Got it right about 95% of the time. It's great for new, weird situations.
The Super-Reader (BERT4ItemSeg): Got it right about 98% of the time. It's the most accurate and reliable for standard work.

Why Does This Matter?

Think of financial research like building a house. If your foundation (the data extraction) is shaky, the whole house will fall down.

Before this paper, researchers were building houses on shaky, rule-based foundations.
Now, they have a solid, automated foundation.
This means investors, auditors, and researchers can trust the data they are analyzing. They can find risks, compare companies, and understand market trends much faster and more accurately.

In a nutshell: The authors replaced a broken, rigid ruler with two smart, flexible AI detectives. One is a precise, trained librarian for the heavy lifting, and the other is a quick-learning genius for new challenges. Together, they make reading financial reports easier, faster, and much more accurate.

Model	Core Items (Macro-F1)	Other Items (Macro-F1)	Key Characteristics
BERT4ItemSeg	0.9826	0.9692	Highest accuracy; requires local GPU; robust to format variations.
CRF (Baseline)	0.9788	0.9691	Strong performance; runs on CPU; struggles with new items.
GPT4ItemSeg	0.9552	0.9385	High adaptability; requires API costs; prone to higher variance on rare items.
Rule-Based	0.9048	0.9010	Lowest performance; brittle; high maintenance cost.

Utilizing Pre-trained and Large Language Models for 10-K Items Segmentation

Detective #1: The Super-Reader (BERT4ItemSeg)

Detective #2: The Chatbot Genius (GPT4ItemSeg)

The Showdown

Why Does This Matter?

1. Problem Statement

2. Methodology

A. Data Preparation

B. Approach 1: BERT4ItemSeg (PLM-Based)

C. Approach 2: GPT4ItemSeg (LLM-Based)

3. Key Contributions

4. Experimental Results

5. Significance and Implications

Utilizing Pre-trained and Large Language Models for 10-K Items Segmentation

Detective #1: The Super-Reader (BERT4ItemSeg)

Detective #2: The Chatbot Genius (GPT4ItemSeg)

The Showdown

Why Does This Matter?

1. Problem Statement

2. Methodology

A. Data Preparation

B. Approach 1: BERT4ItemSeg (PLM-Based)

C. Approach 2: GPT4ItemSeg (LLM-Based)

3. Key Contributions

4. Experimental Results

5. Significance and Implications

More like this

Skewness Dispersion and Stock Market Returns

The Corporate Bond Factor Replication Crisis

From Core to Periphery? Assessing Remote Works Potential to Rebalance EU Regional Development

Measuring Strategy-Decay Risk: Minimum Regime Performance and the Durability of Systematic Investing

Climate-Aware Copula Models for Sovereign Rating Migration Risk