Portfolio Optimization Proxies under Label Scarcity and… — Plain-Language Explanation

Imagine you are trying to teach a young apprentice chef (the Student) how to cook the perfect meal for a large, unpredictable crowd.

Usually, you would hire a master chef (the Teacher) to cook the meal, taste it, and then tell the apprentice exactly what to do next time. But here's the problem: You only have 104 days of recipes (labeled data) to teach from, and the crowd's tastes change wildly from day to day (market regimes). Sometimes they want spicy food; sometimes they want bland. If you just memorize the 104 recipes, the apprentice will fail when the crowd asks for something new.

This paper proposes a clever way to train this apprentice chef using a mix of real experience and simulated practice, while also giving the apprentice a "gut feeling" about when they are unsure.

Here is the breakdown of their method, using everyday analogies:

1. The Teacher: The "Risk-Averse Master Chef"

Instead of a normal chef who just tries to make the tastiest dish (highest return), the authors hired a Master Chef who is terrified of burning the kitchen down.

The Goal: This teacher doesn't just look for the best meal; they look for a meal that won't cause a disaster if a sudden storm hits (a market crash).
The Tool: They use a specific risk measure called CVaR (Conditional Value at Risk). Think of this as the chef saying, "I don't care if the meal is 10% better; I care that it won't be 50% worse if things go wrong."
The Output: The teacher generates the "perfect" portfolio (the recipe) for specific days. These are the labels the student tries to learn.

2. The Problem: Too Few Recipes, Too Many Ingredients

The student has to learn from only 104 real recipes (labeled data), but there are 576 ingredients (features) to consider.

The Analogy: It's like trying to learn how to bake a cake with 576 different spices, but you've only ever baked 100 cakes. A normal student would just memorize the 100 cakes and fail miserably when asked to bake a new one. This is called overfitting.

3. The Solution: The "Sandwich" Training

To fix the lack of data, the authors use a Sandwich Training method. Imagine the student's learning process is a sandwich:

Top Bun (Supervised Learning): The student looks at the 104 real recipes from the Master Chef and tries to copy them exactly.
The Meat (Unsupervised Learning): The student is then given 1,400 fake, simulated recipes generated by a computer. These aren't real meals, but they are mathematically similar to real ones. The student practices on these fake meals to learn the structure of good cooking (e.g., "don't put too much salt in a stormy weather") without needing a teacher to grade every single one.
Bottom Bun (Supervised Anchoring): Finally, the student goes back to the real 104 recipes to make sure they haven't forgotten the Master Chef's actual style.

This "Sandwich" helps the student learn the principles of cooking rather than just memorizing the specific dishes.

4. The Secret Sauce: The "Uncertain" Apprentice (Bayesian Student)

The authors tested two types of students:

The Deterministic Student: A robot that always gives the exact same answer. If the data is noisy, the robot gets overconfident and makes wild, risky bets.
The Bayesian Student (BNN): This student has a "gut feeling" about uncertainty.
- When the market is calm, the student is confident.
- When the market is chaotic or data is scarce, the student thinks, "Hmm, I'm not 100% sure about this. Maybe I should play it safe and not change my portfolio too much."

The Magic Result: Because the Bayesian student is naturally cautious when unsure, they trade less often.

The Analogy: The Deterministic student is like a nervous driver who swerves left and right every time a bug hits the windshield. The Bayesian student is a calm driver who only swerves when they are sure it's necessary.
The Benefit: This "calmness" saves money on transaction fees (turnover) and prevents the portfolio from crashing during sudden market shifts.

5. The "High-Volatility Paradox"

Here is the most surprising finding. The student was trained on a broad list of assets (like a general grocery store). Then, they were tested on a different list of assets (a specialized spice shop).

The Expectation: You'd expect the student to do worse because the ingredients are different.
The Reality: During high-stress times (market crashes), the student actually performed 140% to 276% BETTER on the new list than on the old one!
Why? The student learned a deep, fundamental rule: "When things get scary, move to safe, defensive ingredients." The new list of assets just happened to have better defensive ingredients (like specific safety ETFs) that the old list didn't have. The student's "gut feeling" allowed them to spot these safety tools immediately.

Summary of Results

Better than the Teacher: The student didn't just copy the teacher; they learned the logic behind the teacher's decisions and actually did better in many situations.
Cheaper: The Bayesian student traded half as much as the robot student, saving a fortune in fees.
Safer: When the market crashed, the Bayesian student lost much less money than traditional methods.
Adaptable: They could handle new, unseen markets, especially when things got scary.

The Big Takeaway

In a world where financial data is scarce and markets are unpredictable, teaching a machine to be "uncertain-aware" is better than teaching it to be a "know-it-all." By combining a risk-averse teacher, simulated practice, and a student that knows when to hold back, you can build a portfolio that is robust, cost-effective, and ready for the unexpected.

1. Problem Statement

The paper addresses the persistent gap between theoretical portfolio optimization and realized out-of-sample performance in live markets. Classical approaches (e.g., Markowitz Mean-Variance, CVaR) suffer from four critical limitations:

Distributional Mismatch: They assume normal distributions and linear correlations, failing to capture fat tails and non-linear co-movements during market stress.
Estimation Fragility: Small errors in estimating expected returns ( $\mu$ ) and covariance ( $\Sigma$ ) lead to large, unstable weight changes.
Regime Blindness: Optimizers solve independently at each rebalance date without mechanisms to generalize across structural market shifts.
Data Scarcity: In high-dimensional settings (many assets, few historical observations), standard machine learning models overfit, while classical optimizers lack adaptability.

The specific challenge tackled is portfolio construction under severe label scarcity (only 104 labeled weekly observations) and regime uncertainty, aiming to build a policy that is robust, interpretable, and cost-efficient.

2. Methodology

The authors propose a Teacher-Student Learning Framework utilizing Semi-Supervised Sandwich Training.

A. The Teacher (CVaR Optimizer)

Role: Generates supervisory labels (optimal portfolio weights) for specific dates.
Mechanism: A Conditional Value-at-Risk (CVaR) minimizer is used instead of Mean-Variance. CVaR is chosen for its convexity and ability to directly control tail risk, making it a robust "ground truth" for risk-aware behavior, even if it is not the absolute Sharpe ratio maximizer.
Output: Supervisory labels $w_{teacher}$ for a subset of real data.

B. Data Augmentation (Synthetic Generation)

To overcome the $N=104$ labeled samples vs. $P=576$ features (small- $n$ , high- $p$ ) problem:

Factor-Based Model: Synthetic market trajectories are generated using a Vector Autoregression (VAR) for factor dynamics (Fama-French + Momentum).
Tail Dependence: Idiosyncratic residuals are modeled using a t-copula to preserve cross-asset tail dependence (fat tails) observed in real crises.
Result: 323 additional synthetic labeled pairs, creating a training pool of 427 samples.

C. The Student Models

Four student architectures are trained to approximate the Teacher's policy:

DNN-sup: Deterministic Neural Network (Supervised only).
BNN-sup: Bayesian Neural Network (Supervised only).
DNN-S: Deterministic Neural Network (Sandwich Training).
BNN-S: Bayesian Neural Network (Sandwich Training).

Key Architectural Features:

Softmax Output: Enforces portfolio weights to lie on the probability simplex ( $\sum w_i = 1, w_i \ge 0$ ) as a hard constraint.
Bayesian Inference: BNNs use Variational Inference with a diagonal Gaussian posterior. Predictions are obtained by averaging $M=20$ Monte Carlo forward passes, providing uncertainty quantification.

D. Sandwich Training Paradigm

Adapted from power-grid optimization, this semi-supervised approach alternates between three stages:

Supervised Warm-up: Train on real labeled pairs to anchor the model to the Teacher's risk preferences.
Alternating Cycles:
- Supervised bursts: Maintain fidelity to the Teacher.
- Unsupervised bursts: Train on synthetic unlabeled scenarios using a structural loss function comprising Empirical CVaR (tail risk) and Entropy-based Diversification. This forces the model to learn the structure of risk management rather than memorizing specific weights.
Supervised Anchoring: Final re-alignment with the Teacher to ensure consistency.

E. Deployment Protocol

Rolling Fine-Tuning: During real-market evaluation, the frozen pre-trained model is periodically fine-tuned on recent data and then reset to its base state. This allows limited adaptation while preventing long-term drift or overfitting.
Evaluation Tiers:
- GRID_3x5: Synthetic stress tests (stability).
- C2A (Constrained Application Assessment): In-distribution evaluation on the same assets (2022–2026) under stress scenarios.
- D2A (Cross-Universe Adaptive Generalization): Out-of-distribution evaluation on a disjoint set of 36 assets (e.g., different ETFs) to test generalization.

3. Key Contributions

Optimization-Proxy Framework for Finance: First application of the semi-supervised sandwich training paradigm to portfolio construction, successfully transferring the "structure" of CVaR optimization to neural networks.
Low-Data BNN Pipeline: Demonstrates that combining factor-based synthetic augmentation with Bayesian Neural Networks allows for effective learning in small- $n$ , high- $p$ regimes while providing calibrated uncertainty estimates.
Implicit Turnover Regularization: A novel emergent property where Bayesian models self-regulate trading turnover to 11–14% weekly (approx. 50% reduction vs. deterministic models) without explicit turnover penalties in the loss function. This is attributed to posterior uncertainty acting as decision inertia.
Hierarchical Generalization (The "HIGHVOL Paradox"): Discovery that models trained on aggregated indices perform better on a disjoint, factor-decomposed universe during high-volatility regimes (Sharpe improvement of +140% to +276%). The models learn broad risk-reduction heuristics that transfer effectively when finer-grained defensive instruments are available.

4. Experimental Results

Synthetic Stability (GRID_3x5):
- BNN-S achieves a Pareto-optimal balance, delivering a Sharpe ratio of ~1.22 with a CVaR of -1.82% (significantly better tail risk than Mean-Variance's -4.40%).
- Turnover: BNN-S maintains ~12% weekly turnover, whereas deterministic DNN-S overtrades at ~24%, incurring higher transaction costs.
Real Market Performance (C2A & D2A):
- BNN-S leads all models with a Sharpe of 2.44 (C2A) and 1.94 (D2A).
- Robustness: BNN-S shows lower variance in D2A ( $\pm 0.127$ ) compared to DNN-S ( $\pm 0.211$ ), confirming Bayesian uncertainty aids generalization.
- HIGHVOL Paradox: In high-volatility regimes, D2A performance for all models surged. BNN-S Sharpe jumped from 1.49 (C2A) to 3.56 (D2A), a +140% improvement. This suggests the models successfully decomposed broad risk heuristics into factor-level defensive positioning (e.g., rotating into USMV, XLU) when available.
Constraint Sensitivity:
- Bayesian models were largely insensitive to added execution constraints (L3) in C2A, indicating the policy had already internalized feasible behavior.
- Deterministic models improved under constraints in D2A, suggesting hard constraints act as an external regularizer for overconfident point estimates.

5. Significance and Implications

Theoretical: Validates that knowledge distillation in finance works by learning the structure of optimal risk management (e.g., tilting to bonds during stress) rather than memorizing specific weight vectors. It proves that "teacher similarity" (correlation with teacher weights) does not predict out-of-sample performance.
Practical:
- Cost Efficiency: The implicit turnover reduction of Bayesian models offers a ~0.6% annual cost saving, crucial for retail deployment.
- Regime Adaptation: The framework demonstrates that uncertainty-aware models are more resilient to regime shifts than deterministic ones.
- Scalability: The approach is viable for retail portfolios ( $10k–$ 100k) using ETFs, offering a path to uncertainty-aware, constraint-respecting portfolio construction that outperforms classical methods in fat-tailed, non-stationary environments.

Conclusion

The paper establishes that a Bayesian Student trained via Semi-Supervised Sandwich Training on synthetically augmented data can outperform a classical CVaR Teacher. The system achieves superior risk-adjusted returns, significantly lower tail risk, and reduced trading costs without explicit penalty terms. Crucially, it exhibits a counter-intuitive "HIGHVOL paradox" where generalization improves in high-volatility regimes when evaluated on a factor-decomposed universe, suggesting that learned risk-heuristics transfer effectively to more granular asset classes.

Portfolio Optimization Proxies under Label Scarcity and Regime Shifts via Bayesian and Deterministic Students under Semi-Supervised Sandwich Training