Generalized Stock Price Prediction for Multiple Stocks… — Plain-Language Explanation

Imagine you are trying to predict the weather for next week. You could just look at the temperature from the last 20 days, but that's not enough. You also need to know about the wind, the humidity, and maybe even read the local news to see if a storm is brewing.

This paper is about building a super-smart stock market weather forecaster. Instead of just looking at past stock prices, the authors built a system that reads thousands of daily news articles to guess what a stock price will do tomorrow.

Here is the breakdown of their invention, explained with simple analogies:

1. The Problem: Too Much Noise

Imagine you are trying to listen to a specific friend's voice at a loud, chaotic party.

The Old Way: You ask a bouncer to find only the people talking about your friend. If the bouncer is bad, you miss important info. If the bouncer is too strict, you get no info.
The New Problem: If you just listen to everyone at the party, you get overwhelmed by noise. You can't tell if someone is talking about your friend or just about the food.
The Authors' Solution: They built a "smart filter" that sits right at the party. It knows exactly who your friend is and instantly tunes out everyone talking about anything else.

2. The Secret Weapon: "Stock Name" as a Magnet

The core trick in this paper is using the Stock Name (like "Apple" or "TSMC") as a magnet to pull out the right news.

The Setup: Every day, there are hundreds of news articles. Most are irrelevant.
The Magic: The system takes the name of the stock (e.g., "TSMC") and turns it into a digital "magnet."
The Filter: It runs this magnet through the pile of news articles.
- Cross-Attention (The Magnet): The stock name "calls out" to the news. Only the articles that "answer back" (are relevant) get picked up.
- Self-Attention (The Group Hug): The system looks at the stock name and all the news articles together, letting them "hug" and decide which ones fit best.
- Position-Aware (The Timeline): It also remembers the order in which the news appeared, just in case the timing matters.

3. The Brain: A Giant Language Model (LLM)

Once the system has filtered the news, it needs a brain to make sense of it.

They use a Large Language Model (LLM), which is like a super-intelligent robot that has read almost everything ever written.
Usually, these robots are great at writing stories but bad at math. The authors had to teach this robot how to read numbers (stock prices) by wrapping the numbers in a special "suit" (called Patch Reprogramming) so the robot understands them as if they were words.

4. The "General" Student vs. The "Specialist"

Old Approach: Train a different student for every single stock. One student learns only about Apple, another only about Ford. This is expensive and slow.
This Paper's Approach: Train one single "General" student to learn about all stocks at once.
Why it's cool: This student learns the general rules of the market. If they understand how "Tech" stocks usually react to bad news, they can apply that logic to a new stock they've never seen before. It's like teaching a chef to cook any dish, rather than hiring a different chef for every recipe.

5. The Results: Smarter and Faster

The authors tested this on the Taiwan Stock Exchange and US stocks.

The Score: Their new method reduced prediction errors by about 7% compared to the old ways. In the world of finance, that's a huge win.
The Trade-off: It takes longer to train (2 days vs. 30 minutes for a simple model), but the accuracy is much better. It's like using a slow, high-end GPS instead of a quick, cheap one that might get you lost.

Summary Analogy

Think of the stock market as a giant library.

Old Models: They just look at the checkout history of books (past prices).
This Model: It sends a librarian (the AI) into the library. The librarian is holding a specific book title (the Stock Name). The librarian scans the whole library, grabs only the books that mention that title, summarizes them, and then uses a super-brain to guess what the next chapter (tomorrow's price) will be.

By using the stock name to filter the noise, the system stops getting distracted by irrelevant news and makes much smarter guesses.

1. Problem Statement

Predicting stock prices is a complex financial forecasting task. While traditional methods (e.g., ARIMA, RNNs) and recent Large Language Model (LLM) approaches exist, they face specific challenges when integrating textual data:

Information Noise: Aggregating all daily financial news introduces significant noise, as most articles are irrelevant to a specific stock.
Retrial Limitations: Traditional Information Retrieval (IR) techniques often rely on keyword matching (e.g., stock names), which can miss implicitly relevant information or yield insufficient data volume.
Generalization Gap: Most existing models are trained on individual stocks, failing to capture collective market dynamics or generalize to unseen stocks without retraining.
Modality Gap: Bridging the gap between unstructured textual data (news) and structured numerical data (time-series prices) within a unified framework remains difficult.

2. Methodology

The authors propose a Generalized Stock Price Prediction Model that integrates daily news with historical price data using a unified framework based on Time-LLM. The architecture consists of four main stages:

A. News Encoding & Attentive Pooling

Instead of processing raw text or filtering news via external IR, the model encodes all daily news articles using a Pre-trained Language Model (PLM) like BERT or DeBERTa. To handle the volume of text (avg. ~220 articles/day) and filter relevance, three Attention-Based Pooling Mechanisms are introduced to generate a single daily news vector ( $c_t$ ) conditioned on the target stock:

Cross-Attentive Pooling (CAP): Uses the Stock Name Embedding as the Query and the news embeddings as Keys/Values. This allows the model to dynamically weight news articles based on their semantic similarity to the specific stock.
Self-Attentive Pooling (SAP): Concatenates the stock name embedding with the sequence of news embeddings and applies self-attention. This allows the stock name to interact with all news items simultaneously to determine relevance.
Position-Aware Self-Attentive Pooling (PA-SAP): Similar to SAP but adds sinusoidal positional embeddings to account for the temporal order of news within a single day.

B. News-Price Fusion

The pooled news vectors and historical price sequences (20 days) are fused using three strategies to create "Stock-aware Features":

Bidirectional Cross-Attention: Computes interactions in both directions (Price-to-News and News-to-Price) to exchange information between modalities.
Graph Convolutional Network (GCN): A two-layer GCN models structural dependencies between stocks and aggregates features over a 5-day window (matching short-term moving averages).
Weighted Integration: The outputs of the fusion modules are combined with original embeddings via a learnable weighted average.

C. Prompting & LLM Backbone

The fused features are fed into a frozen LLM (LLaMA or GPT-2) via Patch Reprogramming (from Time-LLM).

Prompt Design: Instead of verbose descriptions, the prompt contains only the Stock Name. This conditions the LLM on the specific asset identity while keeping the prompt concise to preserve the prominence of numerical inputs.
Training Strategy: The LLM backbone remains frozen; only the reprogramming layers, fusion modules, and prediction head are fine-tuned.

D. Generalized Training

Unlike prior works, this model is trained on a mixed dataset of multiple stocks (6 stocks from TWSE, 42 from BigData23) simultaneously. It is then evaluated on individual stocks to test generalization capabilities.

3. Key Contributions

Generalized Multi-Stock Framework: A single unified model trained across multiple stocks that captures collective market dynamics, applicable to both Taiwan (TWSE) and US (BigData23) markets.
Semantic-Aware News Selection: Introduction of Stock Name Embeddings within attention mechanisms (CAP, SAP, PA-SAP) to automatically filter irrelevant news noise without relying on external retrieval systems.
Advanced Fusion Architecture: Integration of GCNs and Bidirectional Cross-Attention to effectively fuse unstructured text with structured price data, modeling both intra-day and inter-day dependencies.
Empirical Validation: Comprehensive experiments demonstrating the efficacy of the approach across different markets, backbone models (LLaMA, GPT-2), and prediction horizons (1-day and 5-day).

4. Experimental Results

Performance on TWSE (Taiwan):
- The proposed method achieved a 7.11% reduction in Mean Absolute Error (MAE) compared to the baseline (Time-LLM without news) and outperformed classical models (LSTM, FPT).
- Self-Attentive Pooling (SAP) generally provided the most consistent results with lower error variance across different stocks.
- Cross-Attentive Pooling (CAP) performed exceptionally well for large-cap stocks (e.g., TSMC, Foxconn) but showed higher variance.
Performance on BigData23 (US Market):
- The model successfully generalized to the US market. However, performance varied based on the input format:
  - Full Names (TWSE): Worked well with all pooling methods.
  - Ticker Symbols (US): Models relying on CAP struggled because ticker symbols (e.g., "F" for Ford) are semantically ambiguous in PLM spaces. SAP remained robust as it relies less on the explicit semantic query of the stock name.
Ablation Studies:
- Removing the GCN layer significantly increased errors, highlighting the importance of modeling structural dependencies.
- The "Stock Name Prompt" (SNP) in the LLM input provided marginal benefits when the pooling mechanism already utilized stock name embeddings, suggesting the pooling layer is the primary driver for context alignment.
Computational Cost:
- LLM-based models require significantly more training time (~~2 days) compared to LSTM (~~30 mins) but offer substantial accuracy gains (56% MAE reduction vs. LSTM).

5. Significance

This paper addresses a critical bottleneck in financial AI: how to effectively utilize massive, noisy news data for specific asset prediction without manual curation.

Noise Filtering: By embedding the stock name directly into the attention mechanism, the model learns to "ask" the news data, "Is this relevant to this stock?" rather than relying on brittle keyword searches.
Scalability: The generalized framework proves that a single model can serve multiple assets, reducing the computational and engineering overhead of maintaining hundreds of individual stock models.
Market Agnosticism: The approach demonstrates adaptability across different market structures (Taiwan vs. US) and data types (Full names vs. Tickers), provided the semantic representation is handled correctly (e.g., via entity linking for tickers in future work).

The study concludes that integrating LLMs with attention-based news filtering and structural fusion (GCN) significantly enhances stock price forecasting accuracy, offering a robust pathway for next-generation financial AI systems.

Generalized Stock Price Prediction for Multiple Stocks Combined with News Fusion