Data-Driven Global Sensitivity Analysis for Engineering Design Based on Individual Conditional Expectations

Imagine you are a chef trying to perfect a new, complex recipe. You have a "black box" machine (a machine learning model) that tells you exactly how delicious the dish will be based on your ingredients (input variables like salt, heat, cooking time).

Your goal is to understand which ingredients matter most and how they work together.

The Old Way: The "Average" Taste Test (PDP)

Traditionally, engineers used a method called Partial Dependence Plots (PDP). Think of this as asking the machine: "If I change the amount of salt, what happens to the taste, assuming everything else is just an average mix?"

The machine gives you a single line on a graph representing the average effect of salt.

The Problem: This is like averaging the taste of a dish where salt makes it salty in one region but bitter in another. The average might look like "no change at all" (a flat line). You might conclude, "Salt doesn't matter!" when in reality, salt is the most critical ingredient, but its effect depends entirely on how much pepper you added. The averaging process hides the drama.

The New Way: The "Individual" Taste Test (ICE)

The authors of this paper propose a better way using Individual Conditional Expectation (ICE). Instead of averaging everything immediately, they look at every single scenario individually.

Imagine you run the taste test 1,000 times, but each time you keep the other ingredients (pepper, heat, time) fixed at a specific setting.

Scenario A: Low pepper + High heat. (Salt makes it amazing).
Scenario B: High pepper + Low heat. (Salt makes it terrible).

When you plot all 1,000 lines, you see a messy, tangled web of curves. This is the ICE. It reveals that salt does matter, but its effect changes wildly depending on the other ingredients.

The Innovation: Measuring the "Messiness"

The paper's main contribution is turning this messy web of lines into a simple, useful score. They propose two new metrics:

The "Average Impact" Score ( $\mu_{Iice}$ ):
Instead of averaging the results (which hides the truth), they average the magnitude of the change.
- Analogy: Imagine a rollercoaster. The old method (PDP) might say, "The average height of the ride is 50 feet," which sounds boring. The new method looks at the ups and downs of every single car. It says, "Even if the average height is 50, the thrill (the change) is huge!" This tells you the ingredient is important, even if the average effect cancels out.
The "Chaos" Score ( $\sigma_{Iice}$ ):
This measures how much the lines in your ICE web wiggle and disagree with each other.
- Analogy: If all 1,000 lines look almost identical, the ingredient is predictable (low chaos). If the lines are all over the place—some going up, some down, some flat—that's high chaos. This "Chaos Score" tells you: "Hey, this ingredient is a troublemaker! Its effect depends entirely on what other ingredients are in the pot." This is a direct measure of interaction.

The "Correlation" Check

They also added a "Correlation" check. This asks: "Does the messy web of lines generally follow the same direction as the boring average line?"

If the answer is Yes, the average line was a good summary.
If the answer is No (the lines are zigzagging while the average is flat), it means the relationship is complex and the old method was lying to you.

Why This Matters for Engineers

The authors tested this on three real-world problems:

A Math Puzzle: Proving the method works on paper.
Wind Turbines: Figuring out why a turbine blade might break. They found that wind speed and wave height interact in complex ways that the old "average" method missed.
Airplane Wings: Designing the shape of a wing to reduce drag. They discovered that the top and bottom curves of the wing interact in surprising ways.

The Takeaway

In the world of engineering design, we often rely on "black box" computers to make decisions.

Old Method: "On average, this variable doesn't do much." (Risk: Missing critical interactions).
New Method: "On average, it might look small, but look at the variability! It's actually a powerhouse that changes everything depending on the context."

This paper gives engineers a new set of glasses. Instead of seeing a blurry, averaged-out picture, they can now see the individual stories of how variables interact, ensuring that safety-critical designs (like planes and turbines) aren't built on misleading averages.

Here is a detailed technical summary of the paper "Data-Driven Global Sensitivity Analysis for Engineering Design Based on Individual Conditional Expectations."

1. Problem Statement

In modern engineering design, particularly in aerospace, high-fidelity simulations are computationally expensive, necessitating the use of surrogate models (e.g., Polynomial Chaos Expansion, Deep Neural Networks) to approximate the relationship between input variables and a Quantity of Interest (QoI).

The Challenge: Engineers need Global Sensitivity Analysis (GSA) to identify which input variables most influence the output. Traditional variance-based methods (like Sobol' indices) quantify how much variance an input contributes but fail to reveal the functional form of the relationship (e.g., nonlinearity, local thresholds).
The Limitation of Current Explainability Tools:
- Partial Dependence Plots (PDPs): Widely used to visualize the average effect of an input. However, PDPs average over other variables, which can mask interaction effects. If an input has opposite effects in different regions of the input space, the PDP may appear flat (zero importance), leading to misleading conclusions.
- SHAP (SHapley Additive exPlanations): While robust, SHAP relies on game theory, can be computationally intensive for global summaries, and aggregating local explanations can obscure heterogeneity across instances.
- Individual Conditional Expectation (ICE): ICE curves plot the response for individual instances, revealing heterogeneity hidden by PDPs. However, they lack a standardized scalar metric for automated variable ranking and global sensitivity quantification.

2. Methodology

The authors propose a framework that bridges the gap between the visual richness of ICE curves and formal scalar GSA metrics. The methodology involves three main components:

A. Surrogate Modeling

The approach is surrogate-agnostic but utilizes Polynomial Chaos Expansion (PCE) for the case studies due to its spectral structure, which facilitates theoretical proofs. The surrogate model $\hat{f}(x)$ approximates the black-box function $f(x)$ .

B. ICE-Based Global Sensitivity Metrics

The core contribution is the derivation of new metrics based on the dispersion of ICE curves:

ICE-Based Feature Importance ( $\mu_{Iice, x_j}$ ):
- Instead of averaging predictions (as in PDP), the method calculates the standard deviation (importance) of the response for each individual ICE curve (where other variables are fixed).
- It then averages these standard deviations across all realizations of the complementary variables ( $x_C$ ).
- Formula: $\mu_{Iice, x_j} = E_{x_C} [ \sqrt{Var_{x_j}(\hat{f}(x_j, x_C))} ]$ .
- Benefit: This captures the magnitude of the effect even if positive and negative correlations cancel out in a PDP.
Interaction Dispersion Metric ( $\sigma_{Iice, x_j}^2$ ):
- This is the variance of the ICE-based importance values across different instances of $x_C$ .
- Formula: $\sigma_{Iice, x_j}^2 = Var_{x_C} [ Iice(x_j; x_C) ]$ .
- Benefit: A high value indicates that the influence of $x_j$ varies significantly depending on the values of other inputs, signaling strong interactions.
ICE-Based Correlation Value ( $\sigma_{\rho}$ ):
- Quantifies the correlation between individual ICE curves and the global PDP.
- Formula: The standard deviation of the Pearson correlation coefficients between each ICE curve and the PDP.
- Benefit: Detects if interactions fundamentally alter the trend (e.g., flipping the slope) rather than just scaling the output.

C. Theoretical Foundation

The authors provide a mathematical proof (Theorem 3 in Appendix A) demonstrating that for a broad class of functions (including polynomials and PCE surrogates), the proposed ICE-based metric is a lower bound of the PDP-based metric:
$E_{x_C}[Iice(x_j; x_C)] \geq I_{pdp}(x_j)$
This proves that ICE-based importance is always greater than or equal to PDP-based importance, ensuring that interactions are never underestimated.

3. Key Contributions

Novel Scalar Metrics: Introduction of $\mu_{Iice}$ and $\sigma_{Iice}$ to quantify global importance and interaction strength without relying solely on variance decomposition.
Theoretical Proof: Rigorous proof that ICE-based sensitivity is a lower bound of PDP-based sensitivity under truncated orthogonal polynomial expansions, validating that ICE captures effects PDP misses.
Correlation Metric: A new metric ( $\sigma_{\rho}$ ) to quantify how interactions modify the functional relationship (linearity/trend) between inputs and outputs.
Unified Framework: A workflow combining visual diagnostics (PDP/ICE plots) with quantitative metrics to support engineering design exploration.

4. Results and Case Studies

The method was validated on three distinct problems:

Case 1: 5-Variable Analytical Function (Friedman Function):
- Finding: The method correctly identified the interaction between $x_1$ and $x_2$ . While PDP ( $I_{pdp}$ ) and SHAP showed some sensitivity, the ICE-based metric ( $\mu_{Iice}$ ) provided higher values for interacting variables, and $\sigma_{Iice}$ was non-zero only for $x_1$ and $x_2$ , correctly isolating the interaction.
- Comparison: Outperformed PDP in detecting variables where effects canceled out.
Case 2: 5-Variable Wind Turbine Fatigue Problem:
- Context: Predicting tower base bending moment ( $M_{x,twr}$ ) based on wind speed, direction, wave height, etc.
- Finding: While wind speed ( $V_{hub}$ ) was the most important variable, the interaction metric ( $\sigma_{Iice}$ ) revealed that wind direction ( $\theta_w$ ) and wave height ( $H_s$ ) had the strongest interaction effects.
- Insight: Visualizations showed that the impact of wave height on bending moment is highly dependent on wind direction (harmonic patterns), a nuance captured by the ICE metrics but obscured in standard PDPs.
Case 3: 9-Variable Airfoil Aerodynamics:
- Context: Predicting drag coefficient ( $C_d$ ) based on angle of attack and Class/Shape Transformation (CST) parameters.
- Finding: Upper surface CST parameters were dominant. However, the lower surface parameters ( $A_{l,1}, A_{l,4}$ ) showed high $\sigma_{Iice}$ and $\sigma_{\rho}$ , indicating their effects are highly non-linear and dependent on other variables (e.g., angle of attack).
- Insight: The PDP for $A_{l,1}$ appeared nearly flat (suggesting low importance), but the ICE-based metric revealed significant importance due to alternating positive/negative correlations.

General Comparison:

vs. Sobol' Indices: Sobol' indices quantify variance contribution but do not reveal the shape of the response. ICE metrics reveal the behavior of the response.
vs. SHAP: SHAP provides local attributions but can be visually cluttered. ICE/PDP provide smoother, more intuitive global trends.
vs. PDP: ICE metrics prevent the "cancellation effect" where strong but opposing interactions result in a false zero-importance reading.

5. Significance and Limitations

Significance:

Enhanced Design Exploration: Engineers can now rank variables not just by variance contribution, but by the magnitude of their conditional effect and the strength of their interactions.
Safety Criticality: By detecting hidden interactions (where PDPs are flat), the method prevents engineers from overlooking critical variables that could lead to failure under specific conditions.
Interpretability: It offers a middle ground between the simplicity of PDPs and the complexity of SHAP, providing both visual and quantitative insights.

Limitations:

Input Independence: The current method assumes input variables are independent. It is not directly applicable to problems with correlated inputs (common in some engineering datasets) without modification.
Interaction Specificity: While $\sigma_{Iice}$ indicates that interactions exist, it does not explicitly identify which specific variables are interacting (unlike pairwise Sobol' indices). Visual inspection or pairwise analysis is still required.
Surrogate Dependency: The accuracy of the metrics relies on the fidelity of the surrogate model. If the surrogate has high approximation error, the sensitivity metrics may be misleading.

Conclusion

The paper successfully demonstrates that Individual Conditional Expectations (ICE) can be transformed from a purely visual tool into a rigorous quantitative framework for Global Sensitivity Analysis. By introducing metrics that capture the dispersion and correlation of ICE curves, the authors provide a powerful tool for engineering design that uncovers interaction effects and non-linear behaviors often missed by traditional variance-based or averaging-based methods.