Fourier Analysis on the Boolean Hypercube via Hoeffding Functional Decomposition

Imagine you are trying to understand a complex recipe for a delicious cake. You have a list of ingredients (flour, sugar, eggs, vanilla, etc.), but the cake doesn't just taste like the sum of its parts. The interaction between the eggs and the sugar matters just as much as the flour itself.

In the world of Machine Learning, models are like these recipes. They take many inputs (features) and produce an output (a prediction). Fourier Analysis is a mathematical tool that tries to break this recipe down into its simplest ingredients to see which ones matter most.

However, there's a catch. Traditional Fourier Analysis assumes that every ingredient in your pantry is equally likely to be used. It assumes a "uniform" world. But in real life, ingredients are correlated. If you use eggs, you probably use sugar. If you have a "one-hot" feature (like a color that can only be Red, Blue, or Green, but never two at once), the math gets messy because the standard tools assume these things are independent.

This paper introduces a new, smarter way to break down these recipes, even when the ingredients are tightly linked. Here is the breakdown using simple analogies:

1. The Old Way: The "Perfectly Balanced" Scale

Traditional Fourier analysis is like weighing ingredients on a scale that assumes every single combination of ingredients is equally probable.

The Problem: In the real world, data isn't balanced. If you are analyzing customer data, people who buy diapers are much more likely to buy beer than people who don't. The "uniform" scale gives you a distorted view because it doesn't know about these real-world connections.
The Result: Your explanation of why the model made a decision is slightly off because it's ignoring the hidden relationships between variables.

2. The New Way: The "Custom-Tailored" Scale

The authors propose a method called Hoeffding Functional Decomposition (HFD). Think of this as a scale that you can "calibrate" to the specific shape of your data.

The Innovation: They created a new set of "mathematical ingredients" (a basis) that automatically adjusts to how your data is distributed.
The Metaphor: Imagine you are trying to describe a crowd of people.
- Old Method: You assume everyone is standing in a perfect grid, equally spaced.
- New Method: You realize the crowd is clumped together in groups (friends talking, families standing close). Your new method creates a map that accounts for these clumps. It tells you, "Ah, this group of people is influencing the outcome together," rather than treating them as isolated individuals.

3. Solving the "Too Many Ingredients" Problem

The biggest headache in analyzing these models is the Curse of Dimensionality. If you have 20 ingredients, there are over a million possible combinations of ingredients you could mix together. Checking them all is impossible.

The Solution: The authors realized that in most real-world scenarios, you don't need to check every combination. Usually, the main ingredients (main effects) and the most obvious pairs (like flour + sugar) do 95% of the work.
The Trick: They use a technique called Regularization (specifically Elastic Net). Think of this as a "smart filter" that automatically turns off the noise. It says, "We only care about the main ingredients and the top 2-3 pairings." This turns an impossible math problem into a simple, fast calculation that a laptop can solve in seconds.

4. Why This Matters for AI (Explainable AI)

We often use tools like SHAP to explain why an AI made a decision (e.g., "The loan was denied because of your income").

The Issue: SHAP works great when data is independent, but it can get confused when data is correlated (like the "one-hot" encoding problem mentioned in the paper).
The Breakthrough: The authors show that their new method produces results that look almost identical to SHAP in simple cases, but stays accurate even when the data is messy and correlated.
The Benefit: It gives us a more honest explanation. If a model relies on a specific combination of correlated features, this method spots it correctly, whereas older methods might spread the credit (or blame) unevenly.

Summary in a Nutshell

The Problem: Standard math tools for explaining AI assume data is random and independent, which is rarely true in real life.
The Fix: The authors built a new mathematical "lens" (based on Hoeffding Decomposition) that adapts to the actual shape of your data.
The Magic: They turned a super-complex, slow calculation into a fast, linear equation by assuming that only the most important interactions matter.
The Result: We can now explain complex AI models more accurately, even when the data has tricky, real-world dependencies (like one-hot encoded categories or biological correlations).

In short, they took a rigid, one-size-fits-all ruler and replaced it with a flexible, stretchy tape measure that fits the data perfectly, allowing us to finally understand exactly how our AI models are thinking.

1. Problem Statement

Standard Fourier analysis on the Boolean hypercube is a foundational tool for analyzing pseudo-Boolean functions ( $f: \{0, 1\}^d \to \mathbb{R}$ ). However, it relies on a critical assumption: the uniform probability measure. Under this assumption, the Walsh-Hadamard basis (parity functions) forms an orthonormal basis, allowing for a unique decomposition of functions into main effects and interactions.

In real-world machine learning scenarios, this assumption frequently fails due to:

Correlated features: Natural dependencies in data (e.g., Ising models, genomic data).
Deterministic constraints: Such as one-hot encoding, which creates mutually exclusive features.
Non-uniform distributions: Real-world data rarely distributes uniformly across all $2^d$ configurations.

When the input distribution is non-uniform or dependent, the standard parity functions are no longer orthogonal, rendering the standard Fourier decomposition invalid for statistical interpretation (e.g., variance decomposition) and causing issues with feature attribution methods like SHAP.

2. Methodology

The authors propose a framework that generalizes Fourier analysis by linking it to the Hoeffding Functional Decomposition (HFD), also known as functional ANOVA.

A. Theoretical Foundation: HFD

The HFD decomposes a function $f(X)$ into a sum of components indexed by subsets of variables $S \subseteq [d]$ :
$f(X) = \sum_{S \subseteq [d]} f_S(X_S)$
The components $f_S$ are constrained by hierarchical orthogonality: $f_S$ must be orthogonal to any function of a strict subset of variables $T \subset S$ .

Independent Case: If inputs are independent, this decomposition is unique and analytical (via Möbius inversion of conditional expectations).
Dependent Case: For dependent inputs, the decomposition exists and is unique under specific density assumptions (Assumption 3.2), but finding the basis functions usually requires solving a complex constrained variational problem without a closed-form solution.

B. Proposed Solution: Generalized Fourier Basis

The authors derive an explicit, closed-form basis $\{\psi_S\}_{S \subseteq [d]}$ that satisfies the HFD conditions under any arbitrary probability measure $P$ on the hypercube.

Scaled Parity Functions: They define the basis functions as:
$\psi_S(x) := \frac{\chi_S(x)}{2^{|S|} \cdot p_S(x_S)}$
Where:
- $\chi_S(x) = (-1)^{\sum_{i \in S} x_i}$ is the standard parity function.
- $p_S(x_S)$ is the marginal probability mass function of the subset $S$ .
- The term $1/p_S$ acts as inverse probability weighting to counteract non-uniformity.
Properties:
- Full Support: If the support of $P$ is the entire hypercube ( $p(x) > 0$ for all $x$ ), the set $\{\psi_S\}$ forms a unique basis satisfying the hierarchical orthogonality condition. The decomposition is the unique solution to a weighted least squares (WLS) problem.
- Non-Full Support: In practical settings (e.g., one-hot encoding, small datasets), the support is sparse ( $p(x)=0$ for many $x$ ). The basis $\{\psi_S\}$ is no longer linearly independent. To address this, the authors formulate the decomposition as a regularized optimization problem (Elastic Net) to enforce sparsity and stability, effectively selecting a meaningful subset of interactions.
Computational Strategy:
- The problem is cast as a linear regression: $\min_\beta \|f - \sum \beta_S \psi_S\|_P^2$ .
- To handle the "curse of dimensionality" (exponential number of subsets), they employ low-order truncation (typically $k=1$ or $k=2$ ), assuming that signal energy is concentrated in main effects and pairwise interactions (sparsity of effects principle).

3. Key Contributions

Unification of Fourier and HFD: The paper establishes that standard Boolean Fourier analysis is a special case of HFD under a uniform product measure. It extends this to arbitrary measures.
Explicit Basis Construction: Provides the first explicit, measure-adaptive basis ( $\psi_S$ ) for pseudo-Boolean functions that guarantees hierarchical orthogonality under general distributions.
Algorithmic Tractability: Transforms the non-parametric HFD estimation into a Weighted Least Squares (WLS) regression problem, solvable with standard linear algebra and regularization techniques.
Handling Sparse Support: Introduces a regularization strategy (Elastic Net) to handle the common case where the data does not cover the entire hypercube, ensuring a unique and stable decomposition.
Connection to XAI: Demonstrates that the resulting feature attributions align closely with SHAP (TreeSHAP, KernelSHAP, DeepSHAP), providing a theoretical justification for SHAP's behavior even under feature dependence.

4. Experimental Results

The framework was evaluated on six real-world datasets (classification and regression) involving tree ensembles (Random Forest, XGBoost) and Neural Networks (MLP).

Reconstruction Fidelity: Low-order approximations ( $k=1, 2$ ) achieved high $R^2$ scores (often >0.90) in reproducing the black-box model's predictions, validating the "sparse effects" hypothesis in these domains.
Feature Attribution Consistency:
- Global Importance: The ranking of features by the proposed method showed strong correlation with TreeSHAP and TreeHFD across all datasets.
- Local Importance: For MLPs, the method's local attributions aligned closely with DeepSHAP and KernelSHAP.
- Uniform Case: On the Entacmaea dataset (where the empirical measure is exactly uniform), the method's output was mathematically identical to SHAP, serving as a ground-truth validation.
Efficiency: The decomposition is computed once globally, allowing for instantaneous local and global explanations. Computation times were competitive, ranging from milliseconds for small datasets to ~120 seconds for large datasets with high dimensionality ( $d=80$ ).

5. Significance and Impact

Bridging Theory and Practice: This work bridges the gap between theoretical signal processing (Fourier analysis) and practical machine learning explainability (XAI). It provides a rigorous statistical foundation for analyzing models trained on non-uniform, correlated data.
Improved Explainability: By explicitly accounting for the data distribution, the method offers a more principled alternative to standard SHAP when features are highly correlated or constrained (e.g., one-hot encoded categorical variables), where standard SHAP assumptions often break down.
Scalability: By reducing the problem to a regularized linear regression with low-order truncation, the approach makes global sensitivity analysis feasible for high-dimensional boolean/categorical data, a task previously considered computationally intractable.
Future Directions: The authors identify extending this framework to continuous features and establishing canonical selection principles for non-full-support settings as key areas for future research.

In summary, the paper presents a robust, mathematically grounded generalization of Fourier analysis that adapts to real-world data distributions, offering a powerful new tool for interpretable machine learning and sensitivity analysis.

Fourier Analysis on the Boolean Hypercube via Hoeffding Functional Decomposition

1. The Old Way: The "Perfectly Balanced" Scale

2. The New Way: The "Custom-Tailored" Scale

3. Solving the "Too Many Ingredients" Problem

4. Why This Matters for AI (Explainable AI)

Summary in a Nutshell

1. Problem Statement

2. Methodology

A. Theoretical Foundation: HFD

B. Proposed Solution: Generalized Fourier Basis

3. Key Contributions

4. Experimental Results

5. Significance and Impact

More like this

A Comparative Study of Penalised, Bayesian, Spatial, and Tree-Based Models for Provincial Poverty in Indonesia: Small Samples and High Collinearity

Generalization error bounds for two-layer neural networks with Lipschitz loss function

Tight Convergence Rates for Online Distributed Linear Estimation with Adversarial Measurements

Depth-Based Vector Median Absolute Deviation Moments for Robust Multivariate Shape Analysis

Dealing with positivity violations in mediation analysis via weighted controlled effects, with application to assessing immune correlates of protection in antigen-experienced participants