NCAA Bracket Prediction Using Machine Learning and Combinatorial Fusion Analysis

Imagine you are trying to predict the winner of the biggest, most chaotic basketball tournament in the world: March Madness. It's a time when 68 college teams fight it out, and upsets happen constantly. It's like trying to guess the outcome of a massive, high-stakes game of chance mixed with skill.

For years, people have tried to use computers to solve this puzzle. Usually, they treat it like a simple math problem: "Team A vs. Team B, who wins?" (Yes or No).

This paper introduces a smarter, more creative way to solve the puzzle. Instead of just asking "Who wins?", the authors ask, "How confident are we in every possible matchup, and how do those confidence levels stack up against each other?"

Here is the breakdown of their approach, explained with everyday analogies:

1. The Problem: One Opinion Isn't Enough

Imagine you are trying to predict the weather. If you ask one person, they might be wrong. If you ask five different people, you get five different guesses.

The Old Way: Most computer models just take the average of five experts. If four say "Sunny" and one says "Rain," they go with "Sunny."
The Problem: Sometimes the one expert who said "Rain" was actually the most observant one, but their voice got drowned out by the majority.

2. The Solution: The "Combinatorial Fusion" Kitchen

The authors use a method called Combinatorial Fusion Analysis (CFA). Think of this not as a kitchen where you just mix ingredients, but as a judging panel for a talent show.

They set up five different "judges" (Machine Learning Models):

Logistic Regression: The old-school statistician who loves numbers.
SVM: The strict rule-follower who looks for clear boundaries.
Random Forest: The group of trees (a forest) that votes on every little detail.
XGBoost: The relentless optimizer that learns from its past mistakes.
CNN: The deep-learning artist that spots complex patterns humans miss.

3. The Secret Sauce: "Cognitive Diversity"

This is the most important part. In a normal panel, you want everyone to agree. In this paper, the authors want the judges to disagree in a specific, useful way.

They use a concept called Cognitive Diversity.

The Analogy: Imagine you are trying to find a lost dog.
- Judge A looks at the ground for paw prints.
- Judge B looks up in the trees.
- Judge C listens for barking.
- If all three judges looked at the ground, they would all be wrong if the dog was in a tree.
- Cognitive Diversity measures how different the judges' perspectives are. The paper argues that a team of judges who look at the problem from totally different angles (high diversity) will make a better final decision than a team of judges who all think exactly alike, even if the "different" judges are slightly less accurate individually.

4. The Two Ways to Combine the Votes

The paper tests two ways to combine these judges' opinions:

Score Combination (The "Average Score"):
Imagine each judge gives a score from 0 to 100 on how likely Team A is to win. The computer averages these scores.
- Result: This was good, but not the best.
Rank Combination (The "Leaderboard"):
This is the winner. Instead of looking at the raw scores (0-100), the computer asks: "Who is the #1 pick? Who is #2? Who is #3?"
- The Analogy: Imagine a race. It doesn't matter if the winner finished in 10.01 seconds or 10.02 seconds. What matters is that they crossed the line first.
- The computer looks at the order of the predictions. It asks, "Which team did the most judges put in the top spot?"
- By focusing on the ranking rather than the exact score, the model became much more accurate.

5. The Results: Beating the Experts

The authors tested their "Super-Panel" (specifically a mix of the Statistician, the Rule-Follower, and the Deep-Learning Artist) against the last 10 years of data.

They found that their "Rank Combination" method was the most consistent winner over the last decade.
They applied this to the 2024 tournament.
The Score: Their model predicted the winners with 74.60% accuracy.
The Competition: They compared this to the top 10 public ranking systems (like the famous "KenPom" or "NET Rankings"). The best of those public systems got 73.02%.

The Bottom Line

The paper proves that to predict a chaotic event like March Madness, you shouldn't just ask "Who is the best team?" You should ask, "How do different types of experts rank the teams relative to each other?"

By mixing different types of computer brains and focusing on who they rank #1, #2, and #3 (rather than their exact confidence numbers), the authors built a "super-brain" that is slightly better at predicting the chaos of college basketball than any single expert or popular website currently available.

In short: They didn't just build a better calculator; they built a better committee of judges who know how to listen to each other's unique perspectives.

Here is a detailed technical summary of the paper "NCAA Bracket Prediction Using Machine Learning and Combinatorial Fusion Analysis."

1. Problem Statement

The paper addresses the challenge of predicting outcomes for the NCAA Men's Basketball Tournament (March Madness). While machine learning (ML) has been applied to sports prediction, it is often treated strictly as a binary classification task. The authors argue that sports prediction is inherently unpredictable due to factors like injuries, team chemistry, and luck. Furthermore, existing public ranking systems (e.g., NET, KenPom, Logan) and standard ensemble methods often fail to fully leverage the structural information contained in rankings and the diversity between different predictive models. The goal is to improve prediction accuracy for the 2024 tournament by moving beyond simple classification to a framework that integrates ranking information and cognitive diversity.

2. Methodology

The proposed solution utilizes Combinatorial Fusion Analysis (CFA), a paradigm that combines multiple scoring systems using both score values and rank orders. The methodology consists of four main stages:

A. Data Collection and Preprocessing

Source: Historical data from 2001–2022 (excluding 2020) from Kaggle's "March Machine Learning Mania" and KenPom statistics.
Feature Engineering:
- Features include offensive/defensive efficiency, strength of schedule, and luck metrics.
- Difference Variables: To create a balanced dataset, the authors calculate the difference between Team 1 and Team 2 features ( $Team1 - Team2$ ).
- Labeling: To avoid a single-class dataset (where Team 1 always wins in the raw data), the authors swap Team 1 and Team 2 variables for half the data, creating a binary target (1 for win, 0 for loss).
Feature Selection: Recursive Feature Elimination with Cross-Validation (RFECV) using a Random Forest base was applied to reduce the initial 44 features to an optimal subset of 26 features.

B. Base Models

Five diverse machine learning models were trained as the "base learners" for the ensemble:

Logistic Regression (LR): Regularized (L1/L2) with randomized search for hyperparameter tuning.
Support Vector Machine (SVM): Utilizing kernel functions (Polynomial, RBF, Sigmoid) optimized via randomized search.
Random Forest (RF): An ensemble of decision trees used for its robustness and feature selection capabilities.
XGBoost: A gradient boosting algorithm with regularization terms to prevent overfitting.
Convolutional Neural Network (CNN): A deep learning model with convolutional, pooling, and fully connected layers, optimized using the Adam optimizer and cross-entropy loss.

C. Combinatorial Fusion Analysis (CFA) Framework

The core innovation lies in how these five models are combined. CFA introduces two key concepts:

Rank-Score Characteristic (RSC) Function: A function mapping ranks to scores, allowing the system to analyze the distribution of scores independent of specific data items.
Cognitive Diversity (CD): A measure of the dissimilarity between two scoring systems based on the distance between their RSC functions.
- $CD(A, B) = \sqrt{\frac{1}{n} \sum (f_A(i) - f_B(i))^2}$
- Diversity Strength (DS): The average CD of a model against all others in the ensemble.

Combination Strategies:
The framework generates ensembles using three aggregation methods:

Average Combination (AC): Simple mean of scores or ranks.
Weighted Combination by Performance (WCP): Weights based on historical accuracy.
Weighted Combination by Diversity Strength (WCDS): Weights based on the model's Cognitive Diversity (DS).

The authors generated 156 potential ensemble models (combinations of 5 models, score vs. rank, and 3 weighting methods). To optimize computational efficiency, they focused on WCDS (Diversity Strength) for weighting.

D. Model Selection Strategy

Instead of testing on 2024 data (which was unavailable at the time of writing), the authors analyzed the performance of all ensemble combinations over the previous 10 years. They identified the combination that most frequently outperformed the best individual base model.

Selected Ensemble: The combination of Logistic Regression (A), SVM (B), and CNN (E).
Selection Logic: This "ABE" trio appeared 6 times out of 10 years as the most consistent improver, likely due to high cognitive diversity between the linear (LR), margin-based (SVM), and non-linear deep learning (CNN) approaches.

3. Key Contributions

Novel Perspective: Shifts the paradigm of sports prediction from pure classification to a hybrid approach utilizing Rank Combination alongside Score Combination.
Cognitive Diversity Integration: Explicitly uses the RSC function and Cognitive Diversity to weight ensemble members, rather than relying solely on historical accuracy.
Robust Ensemble Selection: A methodology for selecting the optimal ensemble subset based on long-term stability (10-year historical analysis) rather than short-term overfitting.
Dual Output: The framework produces both Game Rankings (confidence of a specific matchup) and Team Rankings (aggregated performance across all games).

4. Results

The study evaluated the proposed model against 10 popular public ranking systems (e.g., NET, Logan, Massey) for the 2024 NCAA tournament.

Rank Combination Performance:
- The CFA model using Rank Combination with the "ABE" ensemble achieved an accuracy of 74.60%.
- This outperformed the best public ranking system (Logan and NET Rankings), which achieved 73.02%.
- Improvement: A 1.58% increase in accuracy over the state-of-the-art public systems.
Score Combination Performance:
- The CFA model using Score Combination achieved an accuracy of 71.43%.
- While lower than the rank combination, it still outperformed 50% of the public ranking systems listed.

5. Significance

Validation of CFA: The results demonstrate that Combinatorial Fusion Analysis, particularly when leveraging Rank Combination and Cognitive Diversity, is superior to traditional ensemble methods and static ranking systems for complex, stochastic events like sports tournaments.
Generalizability: The use of RSC functions and CD measures suggests this framework is domain-agnostic and could be applied to other fields requiring the fusion of diverse scoring systems (e.g., bioinformatics, portfolio management).
Practical Application: The study provides a concrete, reproducible workflow for improving sports betting and bracket prediction strategies by systematically combining diverse AI models rather than relying on a single "best" algorithm.

In conclusion, the paper successfully argues that treating sports prediction as a ranking problem enhanced by cognitive diversity yields higher accuracy than treating it solely as a classification problem or relying on traditional heuristic rankings.