Self-Scaled Broyden Family of Quasi-Newton Methods in JAX

Imagine you are trying to find the lowest point in a vast, foggy mountain range. You can't see the bottom, but you can feel the slope under your feet. This is what optimization is: finding the best solution (the lowest point) for a complex problem.

This paper is essentially a toolkit upgrade for a popular software library called JAX (specifically its "Optimistix" module). The authors, Ivan Bioli and Mikel Mendibe, have built a new set of "smart hikers" (algorithms) that are better at navigating these mathematical mountains than the standard hikers currently available.

Here is the breakdown in everyday language:

1. The Problem: The Standard Hiker is a Bit Clumsy

The current "standard hiker" in the JAX library is called BFGS. It's a good hiker, but it has two main limitations:

It walks blindly: It uses a simple "backtracking" method to decide how big a step to take. It's like taking a step, realizing you might have gone too far, stepping back, and trying again. It's safe, but slow.
It's rigid: It only uses one specific way to remember the shape of the mountain (the "Hessian" or curvature). It doesn't adapt its memory style based on the terrain.

2. The Solution: The "Self-Scaled" Super-Hikers

The authors introduced a new family of hikers called the Self-Scaled Broyden Family. Think of this as a team of hikers who can change their shoes, their stride, and their memory on the fly.

The Zoom Line Search (The "Scout"):
Instead of just guessing and stepping back, these new hikers use a Zoom technique. Imagine you are walking down a hill and you think you are close to the bottom. Instead of taking one giant step or tiny steps, the "Zoom" method quickly scans the immediate area to find the perfect spot to stop. It satisfies strict rules (Wolfe conditions) to ensure you aren't just stopping on a flat patch that isn't actually the bottom. It's like using a drone to scout the exact best spot before you commit to walking there.
The Self-Scaled Variants (The "Adaptable Memory"):
The classic hikers (BFGS, DFP) have a fixed way of remembering the mountain's shape. The new Self-Scaled hikers can adjust how they remember.
- BFGS is like a hiker who remembers the slope very strictly.
- DFP is a hiker who remembers it differently.
- Broyden is a mix of both.
- Self-Scaled (SS) versions are like hikers who say, "Hey, this part of the mountain is steep, let's adjust our memory scale to be more sensitive," or "This part is flat, let's relax our memory." They automatically tune their internal math to fit the specific problem they are solving.

3. Why Does This Matter? (The PINN Example)

The paper tests these new hikers on a very difficult task: solving the 3D Poisson Equation using Physics-Informed Neural Networks (PINNs).

The Analogy: Imagine trying to teach a robot to understand the physics of water flowing through a 3D pipe. The robot has to learn a complex equation.
The Result: The paper shows that the new Self-Scaled hikers (SSBFGS and SSBroyden) reached the solution much faster and with higher accuracy than the standard hikers.
- In the graphs, you can see the "Loss" (the robot's confusion) dropping much faster for the new methods. It's like the new hikers found the valley floor in half the time it took the old ones.

4. The "Drop-in" Feature

One of the coolest parts of this paper is that these new tools are plug-and-play.

Analogy: If you have a video game controller (the JAX code), you don't need to build a new console to use these new buttons. You just swap the old battery (the old optimizer) for the new, super-charged battery (the new Self-Scaled optimizer), and the game runs better immediately. The code is designed to fit perfectly into the existing JAX ecosystem.

Summary

This paper is a technical note saying: "We took the best mathematical formulas for finding solutions, made them adaptable (Self-Scaled), gave them a better way to choose steps (Zoom), and packaged them so anyone using JAX can use them immediately to solve complex physics and engineering problems faster."

It's not a new theory; it's a better implementation of existing theories, making them accessible and efficient for the modern AI and scientific computing community.

Here is a detailed technical summary of the paper "Self-Scaled Broyden Family of Quasi-Newton Methods in JAX."

1. Problem Statement

The Optimistix library, a modular optimization toolkit for JAX, currently lacks support for two critical components required for advanced quasi-Newton optimization:

Zoom Line Search: The existing implementation relies on a backtracking Armijo line search, which does not guarantee satisfaction of the strong Wolfe conditions. The Zoom line search is preferred for its robustness and ability to find step sizes that satisfy these stronger conditions.
Self-Scaled Broyden Family: While Optimistix includes a standard BFGS implementation, it lacks the broader family of Self-Scaled Broyden methods. These methods (including Self-Scaled BFGS, DFP, and Broyden) have shown superior performance in specific applications like Physics-Informed Neural Networks (PINNs) but were not available as "drop-in" replacements within the JAX ecosystem.

2. Methodology

The authors developed a pure-JAX implementation of the Self-Scaled Broyden family, designed to be fully compatible with the Optimistix solver interface.

Mathematical Foundation

The implementation generalizes classic quasi-Newton updates (BFGS, DFP, Broyden) using a unified update formula for the inverse Hessian approximation $H_k$ . The update is parameterized by two scalars, $\theta_k$ and $\tau_k$ :

$\theta_k$ (Interpolation Parameter): Controls the interpolation between BFGS ( $\theta_k=0$ ) and DFP ( $\theta_k=1$ ). In the general Broyden family, $\theta_k$ is computed dynamically at each iteration to optimize convergence.
$\tau_k$ (Scaling Parameter): Controls the "Self-Scaled" variant. When $\tau_k = 1$ , the method reduces to the standard unscaled version. When computed dynamically, it adjusts the step size scaling to improve convergence properties.

The update formula involves:
$H_{k+1} = \frac{1}{\tau_k} \left( H_k - \frac{H_k y_k y_k^\top H_k}{y_k^\top H_k y_k} + \phi_k (y_k^\top H_k y_k) v_k v_k^\top \right) + \rho_k s_k s_k^\top$
Where $s_k$ is the step and $y_k$ is the gradient difference.

Software Design

The code follows a hierarchical class structure mirroring the mathematical relationships:

Base Class (AbstractSSBroydenFamily): Handles Hessian initialization, auxiliary quantity computation, and dispatches to specific update terms. It exposes hooks compute_thetak and compute_tautk.
Specialized Classes:
- AbstractSSBroyden: Computes both $\theta_k$ and $\tau_k$ dynamically.
- AbstractBroyden: Inherits from SS-Broyden but fixes $\tau_k = 1$ .
- AbstractSSBFGS: Fixes $\theta_k = 0$ (BFGS form) but allows dynamic $\tau_k$ .
- AbstractBFGS: Fixes $\theta_k = 0$ and $\tau_k = 1$ (Classic BFGS).
- AbstractSSDFP and AbstractDFP: Similar specialization for DFP ( $\theta_k = 1$ ).
Integration: The new solvers are designed as drop-in replacements for Optimistix, supporting composition with existing descents and searches. They also include a wrapper to distinguish between actual quasi-Newton iterations and internal line search steps, a feature missing in the base Optimistix design.

3. Key Contributions

Zoom Line Search Integration: The authors integrated the Zoom line search (Algorithm 3.6 from standard literature) into Optimistix, ensuring strong Wolfe conditions are met at every step.
Full Self-Scaled Broyden Family Implementation: The paper provides the first pure-JAX implementation of the complete family, including:
- Standard: BFGS, DFP, Broyden.
- Self-Scaled Variants: SSBFGS, SSDFP, SSBroyden.
Modular Architecture: The implementation allows users to subclass abstract variants to plug in custom descent directions or search strategies, maintaining the composable nature of Optimistix.
Iteration Accounting: A utility wrapper was added to separate quasi-Newton iterations from line search sub-steps, enabling more accurate performance comparisons between solvers.

4. Results

The authors validated the implementation using a numerical example involving Physics-Informed Neural Networks (PINNs) solving the 3D Poisson equation ( $-\Delta u = f$ ) on a unit cube with Dirichlet boundary conditions.

Setup: A fully connected neural network (3 hidden layers, 32 units, tanh activation) was trained to minimize a loss function combining the PDE residual and boundary conditions.
Comparison: The performance of BFGS, SSBFGS, Broyden, and SSBroyden was compared over 10,000 iterations.
Findings:
- The Self-Scaled variants (SSBFGS and SSBroyden) converged significantly faster than their standard counterparts.
- They achieved lower loss values and reduced relative $L_2$ and $H_1$ errors in fewer iterations.
- This confirms previous findings that self-scaling improves optimization stability and speed for PINN training.

5. Significance

Community Adoption: By providing a JAX-native, Optimistix-compatible implementation, the authors lower the barrier to entry for researchers wishing to use advanced quasi-Newton methods in differentiable programming.
Performance Gains: The results suggest that for complex, non-convex problems like PINNs, the Self-Scaled Broyden family offers a tangible performance advantage over the standard BFGS algorithm, potentially reducing training time and improving solution accuracy.
Reproducibility: The code is open-source, allowing the community to verify results and extend the library with new variants or custom line search strategies.

Availability: The implementation is available at https://github.com/IvanBioli/ssbroyden_optimistix.git.

Self-Scaled Broyden Family of Quasi-Newton Methods in JAX

1. The Problem: The Standard Hiker is a Bit Clumsy

2. The Solution: The "Self-Scaled" Super-Hikers

3. Why Does This Matter? (The PINN Example)

4. The "Drop-in" Feature

Summary

1. Problem Statement

2. Methodology

Mathematical Foundation

Software Design

3. Key Contributions

4. Results

5. Significance

More like this

Mitigating Instance Entanglement in Instance-Dependent Partial Label Learning

Missingness Bias Calibration in Feature Attribution Explanations

Why Is RLHF Alignment Shallow? A Gradient Analysis

Differential Privacy in Two-Layer Networks: How DP-SGD Harms Fairness and Robustness

U-Parking: Distributed UWB-Assisted Autonomous Parking System with Robust Localization and Intelligent Planning