Data-Driven Prediction and Control of Hammerstein-Wiener Systems with Implicit Gaussian Processes

Imagine you are trying to teach a robot to drive a car. But there's a catch: the car is a bit weird. It has a gas pedal that doesn't work linearly (pushing it halfway doesn't give you half speed; maybe it's sticky or has a dead zone), and it has a speedometer that is also weird (it might round numbers up or down in a strange way).

In the middle, there is the actual engine and transmission, which behaves like a normal, predictable machine.

This setup is called a Hammerstein-Wiener system.

Hammerstein: The weird gas pedal (Input Nonlinearity).
Wiener: The weird speedometer (Output Nonlinearity).
The Middle: The linear, predictable engine.

The problem is: We don't know exactly how weird the pedal or the speedometer are. We only have a logbook of past drives (data) showing what we pressed and what the speedometer read.

The Old Way: "Guessing the Whole Car"

Most data-driven methods (like standard AI) try to learn the entire car as one giant, mysterious black box. They look at the pedal and the speedometer and say, "Okay, I'll just memorize every possible combination."

The Flaw: This is like trying to learn a language by memorizing every sentence in a dictionary without understanding grammar. It works okay for simple things, but if you ask the AI to drive in a situation it hasn't seen before, it gets confused. It doesn't know that the engine is actually linear and predictable; it thinks the whole car is chaotic.

The New Way: "The Physics-Informed Detective"

This paper proposes a smarter detective approach using Gaussian Processes (GP). Think of a GP as a super-smart guesser that knows how to handle uncertainty.

Instead of guessing the whole car, the authors build a model that respects the car's structure:

The "Implicit" Trick: Instead of trying to write down a formula for the weird pedal and speedometer, they write a rule that says: "If you take the weird pedal input, run it through the engine, and then run it through the weird speedometer, the result must match the logbook."
- They don't solve for the weird parts directly; they solve for the relationship between them. This is like solving a puzzle by looking at how the pieces fit together, rather than trying to draw the picture of every piece from scratch.
The "Virtual" Clues (Monotonicity):
- The authors know that a speedometer usually doesn't go backward when you speed up. It's "monotonic" (it only goes up).
- Standard AI might guess a speedometer that goes up, then down, then up again because it's just looking at noisy data.
- The Solution: The authors add "Virtual Derivative Points." Imagine placing invisible "police officers" along the road who shout, "Hey! The speed must keep going up!" The AI listens to these invisible officers and corrects its guess to ensure the speedometer behaves logically.
The "Stable Spline" Safety Net:
- To make sure the engine part of the model doesn't go crazy (like predicting infinite speed), they use a mathematical safety net called a "Stable Spline Hyperprior."
- Think of this as a training leash. It tells the AI, "You can learn the engine, but it must behave like a real, stable engine. No flying cars."

The Result: Better Driving

The paper tests this on two scenarios:

Prediction: Trying to guess where the car will be in the future.
- Result: The new method was much more accurate than the "black box" AI. It understood the structure, so it didn't get confused by the weird pedal or speedometer.
Control (MPC): Actually driving the car to hit a target speed.
- Result: The new controller kept the car on target even when the speedometer was lying. The old "black box" controllers missed the target because they didn't understand the sensor's quirks.

The Trade-off

There is one downside: It's slower.
Because the AI is doing a lot of complex math to respect the physics and the "virtual police officers," it takes longer to compute a decision (like 4 seconds vs. 0.01 seconds for the simple AI).

Analogy: The old AI is a fast-food chef who throws ingredients in a pot and hopes it tastes right. The new AI is a Michelin-star chef who measures every gram, checks the temperature, and tastes the sauce. It takes longer, but the meal is much better.

Summary

This paper teaches a computer how to learn a complex, weird system by:

Respecting the structure: Knowing there's a linear part in the middle.
Using "Invisible Police": Forcing the AI to respect logical rules (like speed only going up).
Using Safety Leashes: Ensuring the model stays realistic.

The result is a system that learns faster, predicts better, and controls more accurately than standard "black box" AI, even though it takes a bit more computing power to do so.

1. Problem Statement

The paper addresses the challenge of data-driven prediction and control for Hammerstein-Wiener (H-W) systems. H-W systems are block-oriented nonlinear models consisting of a static input nonlinearity ( $\psi$ ), a linear dynamic system ( $G(q)$ ), and a static output nonlinearity ( $\phi^{-1}$ ).

Key Challenges Identified:

Limitations of Existing Methods:
- Black-box Gaussian Processes (GP): Standard GP models ignore the specific block structure of H-W systems, leading to larger function spaces, overfitting, and poor generalization.
- Willems' Fundamental Lemma (WFL): While WFL enables data-driven control for linear systems, its extension to nonlinear systems (via signal embedding) typically requires a known finite-dimensional dictionary of basis functions. This is impractical for general nonlinearities. Furthermore, existing WFL extensions for H-W systems often fail to handle the output nonlinearity (the Wiener part) effectively.
- Uncertainty Propagation: Standard Model Predictive Control (MPC) using one-step-ahead GP predictors requires complex and often intractable uncertainty propagation over the prediction horizon.
Goal: Develop a data-driven framework that encodes the H-W structure into the learning process, handles unknown nonlinearities without basis functions, ensures monotonicity of output nonlinearities, and facilitates multi-step-ahead prediction for control without explicit uncertainty propagation.

2. Methodology

The authors propose an Implicit Gaussian Process (GP) framework that learns the system dynamics by treating the predictor as an implicit function rather than an explicit input-output mapping.

A. Implicit Predictor Formulation

Instead of learning $y_{future} = f(u, y_{past})$ , the authors utilize a multi-step-ahead ARX structure derived from Willems' Fundamental Lemma applied to the linear part of the system.

The system trajectory is characterized by the implicit equation:
$0 = [\Gamma_1 \quad \bar{\Gamma}_2] \text{col}(\Psi(u), \Phi(y_p), \Phi(y_f)) - \bar{\Gamma}_2 e$
where $\Psi$ and $\Phi$ represent the unknown input and output nonlinearities, and $\Gamma_1, \Gamma_2$ are linear mapping parameters.
This formulation avoids composite nonlinear functions, allowing the structure to be encoded directly into the kernel.

B. Structured Kernel Design

GP Priors: The nonlinearities $\psi(\cdot)$ and $\phi(\cdot)$ are modeled as GPs with specific kernel functions ( $k_u, k_y$ ).
Structured Kernel: The kernel for the implicit function $f(\eta)$ is derived analytically from the priors of the nonlinearities and the linear parameters. This creates a "physics-informed" kernel that restricts the function space to be compatible with the H-W structure.
Hyperparameter Tuning (JMAP-ML): The linear parameters ( $\Gamma_1, \Gamma_2$ ) are treated as hyperparameters. To prevent overfitting (due to their high dimensionality), a Stable Spline Hyperprior is enforced on these parameters. The model is trained by solving a Joint Maximum-A-Posteriori / Maximum-Likelihood (JMAP-ML) problem.

C. Enforcing Monotonicity

Virtual Derivative Points: To ensure the output nonlinearity $\phi(\cdot)$ is monotonically increasing (a physical requirement for many sensors), the authors add virtual derivative points to the training data.
Expectation Propagation (EP): Since enforcing strict monotonicity in GPs is non-trivial, the probit likelihood of the derivative being positive is approximated using Expectation Propagation. This modifies the posterior distribution to favor monotonic functions.

D. Data-Driven Predictive Control (DDPC)

Multi-Step Prediction: The implicit model naturally provides multi-step-ahead predictions, bypassing the need for recursive one-step prediction and the associated difficult uncertainty propagation.
Control Formulation: A receding horizon control problem is formulated to minimize expected control cost.
Chance Constraints: Output constraints are handled as chance constraints. Using the Lipschitz continuity of the inverse nonlinearity and the properties of the GP posterior, the authors derive a deterministic constraint tightening term that guarantees the chance constraint satisfaction with a specified probability.

3. Key Contributions

Implicit GP Framework for H-W Systems: The first method to apply implicit GP regression specifically to Hammerstein-Wiener systems, successfully encoding both input and output nonlinearities into the kernel structure without requiring basis functions.
Structured Kernel & Hyperpriors: Development of a structured kernel that combines linear model parameters (estimated via Stable Spline hyperpriors) and GP priors for nonlinearities, significantly reducing the function space compared to black-box models.
Monotonicity via Expectation Propagation: A novel application of Expectation Propagation with virtual derivative points to enforce monotonicity in the output nonlinearity within the implicit GP framework.
Direct Multi-Step Control: A control strategy that utilizes the implicit model for direct multi-step prediction, avoiding the computational intractability of uncertainty propagation found in standard GP-MPC.
Chance Constraint Satisfaction: A rigorous derivation of constraint tightening terms that guarantee output chance constraints in the presence of model uncertainty and nonlinearities.

4. Results

The proposed algorithms were validated through numerical simulations on two examples:

Prediction Benchmark:
- Compared against a black-box GP (general kernel), a recursive one-step black-box GP, and a linear predictor.
- Performance: The proposed implicit GP achieved significantly lower Root-Mean-Squared Errors (RMSE). For multi-step prediction ( $L'=4$ ), it reduced median errors by 59.8% compared to multi-step black-box GP and 70.5% compared to linear predictors.
- Monotonicity: The inclusion of virtual derivative points successfully recovered monotonic output nonlinearities, whereas the model without them produced non-monotonic estimates.
- Hyperprior Necessity: Removing the stable spline hyperprior led to significant performance degradation, validating the JMAP-ML approach.
Control Benchmark (pH Process):
- Applied to a pH process control problem with input/output constraints.
- Performance: The proposed DDPC algorithm tracked the reference signal as accurately as an ideal Nonlinear MPC (using the true model) and significantly outperformed black-box GP and linear predictors.
- Constraint Handling: The proposed method successfully respected output bounds, while black-box and linear methods failed to track peaks accurately due to poor nonlinearity modeling.
- Computational Cost: The proposed method is computationally heavier (median 4.41s/iteration) compared to black-box GP (0.24s) due to the complex kernel and hyperparameter optimization, but remains feasible for offline training and moderate online control.

5. Significance

This work bridges the gap between system identification theory (Willems' Lemma, block-oriented structures) and modern machine learning (Gaussian Processes).

Physics-Informed Learning: It demonstrates that incorporating structural knowledge (H-W blocks) into the kernel design yields superior data efficiency and generalization compared to purely data-driven black-box approaches.
Robust Control: By providing a rigorous way to handle output nonlinearities and uncertainty in a control setting, it offers a viable path for controlling complex industrial processes where first-principles models are unavailable but structural knowledge exists.
Future Impact: The methodology opens avenues for applying implicit learning to other structured nonlinear systems, though the authors note that reducing computational complexity for real-time applications remains a key area for future work.