Revisit, Extend, and Enhance Hessian-Free Influence… — Plain-Language Explanation

Imagine you are the head chef of a massive, high-end restaurant (the AI Model). Your goal is to serve the perfect dish to your customers (the Validation Data). To do this, you rely on a huge cookbook of recipes and ingredients you've collected over time (the Training Data).

Sometimes, the cookbook has problems:

Some recipes are written by a confused intern (Noisy Labels).
Some ingredients are spoiled (Adversarial Attacks).
Some recipes are biased against certain types of customers (Unfairness).

Traditionally, to fix the menu, chefs used a very slow, expensive method: Retraining. They would take out one bad recipe, rewrite the whole book from scratch, and taste the dish again. If the dish got better, they kept the change. If not, they put the recipe back. Doing this for thousands of recipes is impossible—it would take forever and burn a hole in your budget.

The Old "Smart" Shortcut (Hessian Inverse)

Scientists invented a mathematical shortcut called Influence Functions. Instead of rewriting the whole book, they tried to calculate exactly how much one specific recipe would change the final taste using complex calculus (the Hessian Matrix).

Think of this like trying to calculate the exact gravitational pull of every single grain of sand in a desert to predict how a single grain of sand will move a dune. It's theoretically perfect, but in the real world (especially with deep learning models that have billions of parameters), the math is so complex it often breaks, takes too long, or simply doesn't exist.

The Paper's Big Idea: "Just Look at the Gradients" (Inner Product)

The authors of this paper say: "Stop trying to calculate the exact gravitational pull of every grain of sand. Let's just look at the direction the wind is blowing."

They revisit a simple method called Inner Product (IP).

The Metaphor: Imagine you have a "Target Direction" (the goal of making the dish taste better). You look at a specific recipe in your cookbook. Does this recipe push the dish in the same direction as your target, or does it push it in the opposite direction?
The Math: Instead of doing heavy, complex math to find the "inverse Hessian" (the exact gravitational pull), they just multiply the "push" of the recipe by the "push" of the target.
The Result: If the numbers match up (positive score), the recipe is helpful. If they clash (negative score), the recipe is harmful.

Why is this surprising?
Usually, in science, simple approximations are considered "dumb" compared to complex, precise calculations. The authors discovered that for deep learning, this "dumb" simple method actually works better than the complex ones because the complex math gets too messy and unstable.

The Three Upgrades

The paper doesn't just say "use the simple method." They upgraded it in three cool ways:

1. Extending the Menu (Fairness & Robustness)

Usually, chefs only care if the food tastes good (Utility). But what if the food is delicious but makes some customers sick (Unfairness) or poisons the kitchen if a saboteur sneaks in a bad ingredient (Robustness)?

The authors showed you can use this simple "direction check" to see if a recipe makes the model fairer (treating all customers equally) or safer (resisting poison). You just change the "Target Direction" to be about fairness or safety instead of just taste.

2. The "Taste-Test Panel" (IP Ensemble)

One chef might have a bad day or a biased palate. To be sure, you don't just ask one person; you ask a panel of chefs.

The authors created IP Ensemble. Instead of using one model to check the recipes, they use a "panel" of slightly different models (created by a trick called dropout, which is like asking the chef to close their eyes and guess).
They average the opinions of this panel. This makes the result much more reliable and less likely to be a fluke.

3. Speeding Up the Kitchen

The complex methods (like LiSSA or EKFAC) are like trying to solve a Rubik's cube while running a marathon. The simple IP method is like checking a compass.

The paper shows that their method is hundreds of times faster than the complex competitors, yet it still finds the bad recipes and removes them, leading to a better final dish.

Real-World Proof

The authors tested this in three scenarios:

Cleaning Noisy Data: They found and removed "confused" labels from image datasets (like CIFAR), making the AI recognize cats and dogs much better.
Fixing Bias: They fine-tuned a language model (RoBERTa) to be fairer to different groups of people, improving both its accuracy and its fairness at the same time.
Defending Against Attackers: They protected a model from hackers who tried to trick it with bad data, proving that removing the "bad apples" beforehand makes the system much tougher.

The Takeaway

In a world where AI models are getting bigger and more complex, we often think we need more complex math to fix them. This paper argues the opposite: Sometimes, the simplest tool is the most powerful.

By ignoring the impossible-to-calculate "perfect math" and just looking at the basic direction of the data, we can clean our datasets, make our AI fairer, and defend it against attacks—all in a fraction of the time. It's a reminder that in the kitchen of AI, sometimes you just need a good compass, not a supercomputer.

1. Problem Statement

Influence functions are mathematical tools used to estimate the impact of individual training samples on a model's predictions without the computational cost of retraining. Traditionally, they rely on a first-order Taylor expansion involving the inverse of the Hessian matrix ( $H^{-1}$ ).

However, applying these methods to deep learning models faces two critical challenges:

Non-Convexity: Deep neural networks have non-convex loss functions, meaning the Hessian matrix may not be positive definite, and its inverse may not exist or be unstable.
Scalability: Deep models have millions of parameters. Computing and inverting the full Hessian matrix is computationally prohibitive ( $O(d^3)$ complexity).

Existing attempts to approximate the Hessian inverse (e.g., LiSSA, EKFAC, DataInf) often introduce significant theoretical errors, numerical instability, or remain too slow for large-scale applications.

2. Methodology

The authors propose a three-pronged approach: Revisit, Extend, and Enhance.

A. Revisit: The Inner Product (IP) Formulation

The paper revisits a "naive" approximation method (previously seen in TracIn or early works) where the inverse Hessian matrix ( $H^{-1}$ ) is replaced by an Identity matrix ( $I$ ).

Formula: Instead of $I_{util} = \sum \nabla \ell(z_j)^\top H^{-1} \nabla \ell(z_i)$ , the method uses:
$I_{IP}(-z_i) = \sum_{z_j \in V} \nabla \ell(z_j; \hat{\theta})^\top \cdot \nabla \ell(z_i; \hat{\theta})$
Mechanism: This calculates the Inner Product (IP) between the gradient of a training sample and the aggregated gradient of the validation set.
Rationale: In deep learning, the Hessian is often ill-conditioned. Heavy regularization (adding $\lambda I$ ) effectively turns the inverse Hessian into something close to the identity matrix. Therefore, IP measures the alignment between a training sample's gradient and the direction of improvement for the validation set. A high positive IP score indicates the sample helps the model; a negative score indicates it is detrimental.

B. Extend: Beyond Utility to Fairness and Robustness

The authors extend the IP framework to measure influence not just on model accuracy (utility), but on fairness and robustness by modifying the impact function $f$ :

Fairness: They define an impact function based on Demographic Parity (DP). The influence is calculated as the inner product between the gradient of the DP metric and the sample gradient.
Robustness: They simulate an adversarial attack (white-box) on the validation set to create an adversarial validation set $V'$ . The influence on robustness is the inner product between the gradients of these adversarial samples and the training sample.

C. Enhance: IP Ensemble

To address the non-uniqueness of solutions in non-convex optimization (local minima) and improve generalization, the authors propose IP Ensemble:

Technique: Instead of training multiple models from scratch (expensive), they use Dropout on the converged model parameters to simulate diverse models.
Process: They compute IP scores across these diverse "dropout models" and average them.
Benefit: This captures the uncertainty in the solution space without the cost of retraining or storing multiple checkpoints (unlike TracIn or GEX).

3. Key Contributions

Theoretical Insight: The paper argues that simple Hessian-free approximations (IP) often outperform complex Hessian-inverse approximations in deep learning because the latter suffer from numerical instability and the non-existence of the inverse in non-convex settings.
Unified Framework: They formalize IP as a general framework applicable to Utility, Fairness, and Robustness simply by changing the target gradient.
IP Ensemble: A novel, low-cost ensemble strategy using dropout to improve the reliability of influence scores in non-convex landscapes.
Empirical Validation: Extensive experiments showing that IP and IP Ensemble are faster and often more accurate than state-of-the-art methods (LiSSA, EKFAC, TracIn, DataInf).

4. Experimental Results

The authors evaluated their methods on synthetic data and three real-world application scenarios:

Synthetic Data:
- On convex (linear) data, IP correlates nearly perfectly with exact influence functions.
- On non-convex (half-moons) data, traditional influence estimates become noisy and unreliable, while IP successfully separates detrimental samples from inliers.
Noisy Label Correction (Vision):
- Datasets: CIFAR-10N, CIFAR-100N, Animal-10N.
- Task: Identify and remove the bottom 5% of detrimental samples, then retrain.
- Result: IP Ensemble achieved the highest average accuracy (82.93%), outperforming the vanilla baseline (80.90%) and the second-best method (Self-TracIn at 82.45%).
- Speed: IP is 100x to 800x faster than methods like LiSSA and EKFAC.
Fairness-Aware Data Curation (NLP):
- Datasets: RTE, CoLA, QNLI (fine-tuning RoBERTa).
- Task: Remove samples to improve both accuracy and demographic fairness (Demographic Parity).
- Result: IP Ensemble consistently achieved Pareto improvements (improving both accuracy and fairness simultaneously) across all datasets, whereas other methods often traded off one metric for the other.
Defense Against Adaptive Adversaries:
- Datasets: Bank, CelebA, JigsawToxicity.
- Task: Defend against evasion attacks by trimming or relabeling detrimental training samples.
- Result: IP Ensemble demonstrated competitive or superior defense performance compared to LiSSA and EKFAC, particularly when combined with relabeling strategies.

5. Significance

This paper challenges the prevailing assumption that accurate influence estimation in deep learning requires complex Hessian approximations. Its significance lies in:

Simplicity & Efficiency: Demonstrating that a simple gradient inner product is often sufficient and significantly faster, making influence functions scalable to large language models (LLMs) and massive datasets.
Robustness: Showing that "Hessian-free" methods are more stable in non-convex deep learning environments than their mathematically rigorous but numerically unstable counterparts.
Versatility: Proving that influence functions are not limited to accuracy but are powerful tools for data-centric AI tasks involving fairness, robustness, and safety.
Practicality: The proposed IP Ensemble offers a practical, low-overhead solution for data curation, enabling practitioners to clean datasets and improve model performance without expensive retraining loops.

Revisit, Extend, and Enhance Hessian-Free Influence Functions