Debiasing LLMs by Fine-tuning

The Big Problem: The "Trend-Chasing" AI

Imagine you hire a very smart, well-read financial advisor. This advisor has read every book, newspaper, and stock report ever written. They are incredibly knowledgeable. However, they have a bad habit: they are a terrible trend-chaser.

If the stock market goes up for three days in a row, this advisor panics and predicts it will go up forever. If it crashes for a week, they predict the world is ending. They can't see the big picture; they only see what happened yesterday and assume it will happen forever.

In the world of Artificial Intelligence, this is called Extrapolation Bias. Large Language Models (LLMs) like the one used in this study (Qwen3-32B) have learned this bad habit because they were trained on human writing. Humans love to write about trends, so the AI learned that "recent history = future prediction."

The Failed Fix: "Just Ask Nicely"

Researchers tried to fix this by simply talking to the AI differently. They tried "prompting," which is like giving the AI a stern lecture:

"Please be rational."
"Don't just follow the trend."
"Think like a mathematician."

The result? It didn't work. The AI still chased trends.

The Analogy: Imagine a dog that has been trained to chase squirrels for ten years. You can't just tell the dog, "Please stop chasing squirrels and sit still." The dog's brain is wired to chase. You have to retrain the dog's actual behavior, not just give it a verbal command. The paper argues that the AI's bias is "hard-wired" into its brain (its mathematical parameters), so you can't fix it just by changing the conversation.

The Solution: "Surgical Retraining" (Fine-Tuning)

The authors propose a new method called Supervised Fine-Tuning (SFT) using a technique called LoRA (Low-Rank Adaptation).

Here is how it works, step-by-step:

1. The "Rational Tutor" Dataset

Instead of letting the AI guess, the researchers created a special textbook.

The Question: "Here is the stock history for the last year. What will happen next?"
The Wrong Answer (Old AI): "It went up, so it will go up more!"
The Right Answer (Rational Tutor): "Actually, markets tend to bounce back and forth. Based on the math, it will likely go down slightly."

They built a massive library of these "Question + Correct Answer" pairs.

2. The "Surgical" Update (LoRA)

The AI is huge (32 billion "brain cells" or parameters). Retraining the whole thing is like trying to rebuild a skyscraper while people are still living inside it. It's too expensive and risky.

Instead, they used LoRA.

The Analogy: Imagine the AI is a giant library. Instead of rewriting every single book in the library, they attach a small, sticky note pad to the shelves.
When the AI reads a question, it first looks at the original books (its original knowledge) and then checks the sticky notes (the new training).
The sticky notes teach the AI: "When you see this specific pattern, ignore your old instinct and use this new, rational answer."
This is cheap, fast, and doesn't break the AI's ability to write poetry or answer general questions.

3. The Test

After the AI studied its "sticky notes," the researchers tested it again.

The Result: The AI stopped chasing trends. When the market went up, it didn't blindly predict it would keep going up. It started predicting that things might calm down or reverse, just like a rational human economist would.

Why This Matters

This paper proves that we can fix the "bad habits" of AI without throwing the AI away.

For Investors: If you use an AI to give financial advice, you don't want it to panic and tell you to sell everything because the market dipped yesterday. This method makes the AI calmer and more logical.
For the Future: As we let AI agents make more decisions on their own (like managing your retirement fund or approving loans), we need to make sure they aren't just copying human mistakes. This paper gives us a "surgical tool" to remove those mistakes.

Summary in One Sentence

The paper shows that you can't fix a biased AI by just asking it nicely; you have to give it a specific, mathematically correct "homework assignment" that surgically updates its brain to stop chasing trends and start thinking rationally.

1. Problem Statement

Large Language Models (LLMs) exhibit systematic extrapolation bias when making predictions based on experimental and real-world data. Specifically, LLMs tend to place excessive weight on recent trends (overreaction) and project transitory shocks as permanent, mirroring behavioral biases found in human forecasters.

Limitation of Current Solutions: Prompt-based interventions (e.g., instructing the model to "be rational" or using Chain-of-Thought reasoning) have proven largely ineffective. The authors argue this is because the bias is not a surface-level framing issue but is encoded in the model's internal parameters learned during pretraining and alignment.
The Gap: Existing alignment processes (RLHF) optimize for helpfulness and safety but do not correct the model's underlying beliefs about data-generating processes. Consequently, LLMs deployed in financial decision-making (e.g., robo-advisors, credit risk assessment) risk amplifying behavioral biases rather than mitigating them.

2. Methodology

The authors propose a Supervised Fine-Tuning (SFT) framework that intervenes directly at the parameter level to correct forecasting behavior while preserving general language capabilities.

A. Core Framework

Bias Identification: The baseline LLM is tested on a held-out set of forecasting prompts (no training exposure) to quantify the extent of extrapolation bias against rational benchmarks.
Instruction Dataset Construction: A separate dataset is created where:
- Prompts: Identical to the test set (e.g., sequences of past stock returns or AR(1) processes).
- Targets (Responses): Encoded with rational forecasts. These targets are derived either from conditional expectations of a rational expectations model or from realized future returns (ex-post data).
- Logic: Each pair acts as a "corrective example," showing the model the same input but pairing it with the answer a disciplined, rational forecaster would give.
Parameter-Efficient Fine-Tuning (LoRA):
- Model: The authors use Qwen3-32B (32 billion parameters), an open-weight model.
- Technique: Instead of full fine-tuning (which is computationally prohibitive and risks catastrophic forgetting), they use Low-Rank Adaptation (LoRA).
- Mechanism: LoRA freezes the original pretrained weights ( $W_0$ ) and attaches small, trainable low-rank matrices ( $A$ and $B$ ) to the attention layers. The output becomes $h = W_0x + BAx$ .
- Benefit: This updates only <1% of the parameters, drastically reducing compute costs and memory usage while surgically adjusting the mapping from observed data to predictions without degrading general language understanding.
Evaluation Protocol: Strict separation of data into Training, Validation (for early stopping), and Test sets. The Test set is never seen during training to ensure out-of-sample validity.

3. Key Contributions

Parameter-Level Intervention: Demonstrates that correcting deep-seated LLM biases requires modifying model weights, not just prompting.
LoRA for Debiasing: Shows that LoRA is an effective, low-cost tool for aligning LLM forecasting behavior with rational benchmarks without sacrificing the model's general capabilities.
Generalizability: Establishes a framework applicable across different domains (synthetic time series and real-world financial data) and bias types (overreaction and extrapolation).

4. Experimental Results

The paper validates the approach in two distinct settings:

A. Controlled Forecasting Experiments (AR(1) Processes)

Setup: Replicates the experiment of Afrouzi et al. (2023) where agents forecast values from AR(1) processes with varying persistence ( $\rho$ ).
Baseline: The unmodified LLM exhibits significant overreaction (negative coefficient on forecast revisions), mirroring human behavior. The bias is strongest for transitory processes ( $\rho=0.0$ ) and weakest for random walks ( $\rho=1.0$ ).
Result: After SFT on rational targets, the overreaction bias becomes statistically insignificant. The coefficient on forecast revisions drops from $\hat{b} = -0.456$ (baseline) to near zero ( $\hat{b} = -0.073$ ) across all persistence levels.

B. Cross-Sectional Stock Return Prediction

Setup: The LLM forecasts monthly returns for S&P 500 constituents using 12-month trailing return histories. Prompts are anonymized to prevent lookahead bias.
Baseline: The model exhibits strong extrapolation, loading positively on recent returns (coefficient on most recent return $\beta_0 = 0.394$ , $t=53.92$ ).
Result: After SFT on realized returns (learning the short-term reversal pattern), the bias is reversed. The coefficients on lagged returns turn negative (e.g., $\beta_0 = -0.120$ , $t=-23.21$ ), indicating the model has internalized the mean-reverting nature of stock returns rather than chasing trends.
Validation: These corrections hold strictly out-of-sample, ruling out overfitting.

5. Significance and Implications

Financial Decision-Making: The findings are critical for the next generation of robo-advisors and autonomous AI agents. An LLM that extrapolates trends would undermine the discipline of automated advice by amplifying client biases. This method allows developers to "debias" the forecasting layer before deployment.
Broader Economic Applications: The framework is applicable to credit risk assessment (preventing procyclical lending), macroeconomic nowcasting (avoiding amplification of transitory shocks), and algorithmic trading.
Cost-Effectiveness: The approach is computationally cheap (costing a few hundred dollars on cloud clusters) compared to pretraining, making it accessible for researchers and practitioners to align LLMs with rational economic benchmarks.

Conclusion: The paper concludes that extrapolation bias in LLMs is a learned regularity, not an architectural inevitability. By using LoRA-based SFT on rational benchmarks, it is possible to surgically correct these biases, enabling the responsible deployment of AI agents in high-stakes economic environments.