A Novel Multi-view Mixture Model Framework for Longitudinal Clustering with Application to ANCA-Associated Vasculitis

⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are a doctor trying to understand why some patients with a rare kidney disease get worse quickly, while others stay stable for years. You have two types of information about each patient:

The "Snapshot" (Static Data): A photo taken at the very beginning. This includes their age, gender, genetic markers, and what symptoms they had on day one.
The "Movie" (Longitudinal Data): A video recording of their kidney health over time. But here's the catch: the camera is broken. Some patients are filmed every week, others every year, and some only a few times at random moments. It's messy, uneven, and hard to watch.

The Problem:
Traditional computer programs are bad at watching these "broken movies." They usually try to turn the movie into a single summary number (like "average kidney health"), which throws away all the interesting details about how the health changed. They also struggle to mix the "Snapshot" with the "Movie" to find hidden groups of patients who behave similarly.

The Solution: The "Time-Traveling Detective" (The New Model)
The authors of this paper built a new AI framework called a Multi-view Mixture Model. Think of it as a super-smart detective that looks at both the Snapshot and the Movie simultaneously to find hidden "clans" or subgroups of patients.

Here is how it works, using simple analogies:

1. The "Smooth Movie" Trick (Neural ODEs)

Usually, if you have a movie with missing frames, you can't see the action clearly.

Old Way: You guess what happened in the missing spots by drawing straight lines between the dots. It looks jagged and fake.
This Paper's Way: They use something called Neural Ordinary Differential Equations (Neural ODEs). Imagine this as a magical "smoothie blender" for time. Instead of connecting dots with straight lines, the AI learns the physics of how the disease moves. It fills in the gaps with a perfectly smooth, continuous curve that makes sense biologically, even if the data is sparse or irregular. It understands that kidney decline isn't a series of jumps, but a flowing river.

2. The "Grouping Game" (Mixture Models)

Once the AI has smoothed out the movies and understood the snapshots, it tries to sort the patients into groups.

The Challenge: A patient might have a "bad" snapshot (severe symptoms at the start) but a "good" movie (kidneys stay stable). Another might have a "good" snapshot but a "bad" movie (kidneys crash later).
The Solution: The model doesn't just look at one thing. It asks: "Who belongs to the same group based on both their starting photo and their movie?"
The "Sparsity" Filter: To stop the AI from inventing too many tiny, meaningless groups, the authors added a "Sparsity Penalty." Think of this as a strict editor who says, "If a group is too small or doesn't make sense, cut it." This forces the AI to find only the most distinct, real-world groups.

3. The Real-World Test: The ANCA Vasculitis Case

The team tested this on 282 Irish patients with a rare autoimmune disease called ANCA-associated vasculitis. They wanted to see if they could predict who would end up needing a kidney transplant (End-Stage Kidney Disease).

What they found:
The AI discovered four distinct types of patients by combining the "Snapshot" and the "Movie":

Group A (The "Stable Renal" Group): These patients had a lot of inflammation in their body (lungs, joints, skin) at the start, but their kidney function (creatinine levels) stayed very stable and low over time. They were the "survivors."
Group B (The "Renal Predominant" Group): These patients had fewer body-wide symptoms, but their kidneys were the main target. Their kidney function was more variable and, in some cases, declined faster.

The Surprise:
Even though these groups looked very different in how their kidneys changed over time, the study found something interesting: The severity of the disease in the kidney tissue (seen in biopsies) didn't perfectly predict which group they fell into. This suggests that looking at the trajectory (the movie) gives us new information that a single biopsy (a snapshot of the tissue) cannot provide.

Why This Matters

This framework is like upgrading from a black-and-white photo album to a high-definition, 3D movie with a smart narrator.

For Doctors: It helps identify high-risk patients earlier, even if their current blood tests look "normal."
For Patients: It moves us toward personalized medicine. Instead of treating everyone with the same disease the same way, doctors can say, "You look like Group A, so your treatment plan should focus on X," or "You look like Group B, so we need to watch your kidneys closely."

In a nutshell: The authors built a smart system that can handle messy, irregular medical data, smooth it out like a pro, and sort patients into meaningful groups based on how their disease actually behaves over time, not just how it looked on day one.

1. Problem Statement

The paper addresses the challenge of longitudinal clustering in clinical settings where data is characterized by two distinct types:

Static Baseline Covariates: Fixed-dimensional features (e.g., demographics, genetic markers, initial disease status) collected at a single time point.
Longitudinal Biomarker Trajectories: Irregularly sampled, time-dependent measurements (e.g., serum creatinine levels) that vary in frequency and timing across patients.

Key Challenges:

Irregular Sampling: Clinical data is often sparse and unevenly spaced, making traditional time-series clustering methods (which assume fixed time grids) ineffective.
Data Heterogeneity: Integrating static vectors with continuous-time trajectories within a unified probabilistic framework is difficult due to their fundamentally different statistical natures.
Interpretability: Standard deep learning approaches often lack interpretability, while traditional mixture models often reduce longitudinal data to summary statistics, losing complex trajectory patterns.
Subgroup Discovery: The goal is to uncover latent patient subgroups that exhibit distinct disease progression patterns and baseline characteristics to improve risk stratification (specifically for End-Stage Kidney Disease in ANCA-associated vasculitis).

2. Methodology

The authors propose a Two-View Mixture Model that integrates static and longitudinal data using a unified probabilistic framework.

A. Model Architecture

The model assumes $N$ observations, where each $x_i = (x_i^{(1)}, x_i^{(2)})$ :

View 1 (Static): $x_i^{(1)} \in \mathbb{R}^{d(1)}$ $x_{i}^{(1)} \in R^{d (1)}$ . Modeled as a Multivariate Normal Distribution within each cluster.
- Preprocessing: Since baseline data includes mixed categorical and numerical variables, the authors use PCAmix (combining PCA and Multiple Correspondence Analysis) to transform these into a low-dimensional continuous space before modeling.
View 2 (Longitudinal): $x_i^{(2)} = \{x_{i,j}^{(2)}\}$ $x_{i}^{(2)} = {x_{i, j}^{(2)}}$ observed at irregular times $\{t_{i,j}\}$ ${t_{i, j}}$ . Modeled using Neural Ordinary Differential Equations (Neural ODEs).
- For each cluster $k$ , a latent trajectory $z_k(t)$ is defined by:
  $\frac{dz_k(t)}{dt} = f_{\theta_k}(z_k(t), t)$
- $f_{\theta_k}$ is a feedforward neural network (2 hidden layers).
- Observations are assumed to be Gaussian: $x_{i,j}^{(2)} \sim \mathcal{N}(z_k(t_{i,j}), \sigma_k^2)$ .

B. Joint Clustering Framework

The model uses a tensor of joint cluster membership probabilities $\pi \in \mathbb{R}^{K^{(1)} \times K^{(2)}}$ , where $K^{(1)}$ and $K^{(2)}$ are the number of clusters for the static and longitudinal views, respectively.

$\pi_{k_1, k_2} = P(\xi^{(1)}=k_1, \xi^{(2)}=k_2)$ .
This allows for flexible alignment: a patient's static profile (View 1) does not have to perfectly match their trajectory profile (View 2), though the model learns the joint distribution.

C. Parameter Estimation (EM Algorithm)

The authors derive an Expectation-Maximization (EM) algorithm:

E-Step: Computes the posterior probability of joint cluster assignment $\gamma(k_1, k_2 | x_i)$ .
M-Step:
- Updates static parameters ( $\mu, \Sigma$ ) via closed-form solutions.
- Updates longitudinal parameters ( $\theta, z_0, \sigma^2$ ) via numerical optimization (Adam optimizer) since Neural ODEs do not have closed-form solutions.
- Sparsity-Inducing Log Penalty: To prevent the joint probability tensor $\pi$ from becoming dense and to encourage interpretable, sparse subgroup structures, a penalty term $-\lambda \sum \log(\delta + \pi_{k_1, k_2})$ is added to the log-likelihood. This modifies the update rule for $\pi$ to shrink small probabilities toward zero.

D. Model Selection

Due to the high parameter count in Neural ODEs, standard criteria like AIC/BIC are deemed unsuitable. Instead, the authors use K-fold Cross-Validated Log-Likelihood to select the optimal number of clusters ( $K^{(1)} \times K^{(2)}$ ).

3. Key Contributions

Novel Framework: Introduction of a multi-view mixture model that seamlessly combines fixed-dimensional static data with irregularly sampled longitudinal trajectories using Neural ODEs.
Handling Irregularity: The use of Neural ODEs allows for direct modeling of continuous-time dynamics without the need for interpolation or discretization of time points.
Interpretability via Sparsity: The integration of a log-penalty on the joint probability tensor ensures that the resulting subgroups are sparse and clinically interpretable, avoiding the "black box" nature of some fusion methods.
Robust Preprocessing: Application of PCAmix to handle mixed-type baseline data (categorical + numerical) within a Gaussian mixture framework.

4. Results

A. Simulation Studies

Parameter Recovery: The EM algorithm successfully recovered true model parameters (means, covariances, trajectories, and cluster assignments) as sample size increased.
Model Selection: Cross-validated log-likelihood correctly identified the true number of mixture components in both 2x2 and 3x3 simulation settings.
Sensitivity: The sparsity parameter $\lambda$ was found to be robust; performance peaked at $\lambda=0.1$ , with the Adjusted Rand Index (ARI) reaching 1.0 (perfect clustering) across various settings.

B. Application to ANCA-Associated Vasculitis (AAV)

Dataset: 282 Irish patients with AAV, featuring 17 baseline covariates and longitudinal serum creatinine measurements (180 days to 3 years).
Optimal Configuration: The 2x2 model (2 static clusters, 2 trajectory clusters) yielded the highest cross-validated log-likelihood.
Identified Subgroups:
- Longitudinal Clusters:
  - Stable (Ls): ~~70% of patients, low and stable creatinine (~~100 µmol/L).
  - Variable (Lv): ~30% of patients, higher and more variable creatinine trajectories.
- Static Clusters:
  - Spo (Pauci-Organ Low Inflammation): Lower rates of extra-renal involvement.
  - Sim (Inflammatory Multi-system): High rates of multi-system involvement (60-75%), higher CRP, and PR3-ANCA positivity.
Joint Structure: The dominant subgroup ( $\hat{\pi}_{2,1} = 0.456$ ) consisted of patients with the "Sim" baseline profile and "Stable" trajectory.
Clinical Outcomes:
- The study analyzed End-Stage Kidney Disease (ESKD) and Berden biopsy classes across the 4 joint subgroups.
- Finding: No statistically significant differences were found in ESKD outcomes ( $p=0.501$ ) or biopsy class distributions ( $p=0.86$ ) across the clusters. This suggests that while the model successfully identified distinct phenotypic and trajectory patterns, these specific patterns did not strongly predict immediate renal failure in this cohort, highlighting the complexity of AAV progression.

5. Significance

Clinical Utility: The framework provides a principled way to stratify patients based on both their initial presentation and their dynamic disease course, which is crucial for personalized medicine in rare diseases like AAV.
Methodological Advancement: It bridges the gap between traditional statistical mixture models and modern deep learning (Neural ODEs), offering a solution that is both flexible (handling irregular data) and interpretable (probabilistic clustering with sparsity).
Generalizability: While applied to AAV, the method is broadly applicable to any biomedical domain involving mixed static and longitudinal data (e.g., oncology, cardiology, neurodegenerative diseases).
Future Directions: The authors suggest extending the model to multivariate longitudinal views (multiple biomarkers simultaneously) and relaxing the Gaussian likelihood assumption to handle skewed or heavy-tailed clinical data.