Latent Generative Models with Tunable Complexity for Compressed Sensing and other Inverse Problems

Imagine you are trying to solve a jigsaw puzzle, but someone has thrown away most of the pieces, spilled coffee on the remaining ones, or even scrambled the picture on the box. This is what scientists call an inverse problem: you have a messy, incomplete result (the measurements), and you need to figure out what the original, perfect picture looked like.

For a long time, computers have used "Generative Models" (AI trained on millions of photos) to act as a guide. Think of these models as a super-smart art teacher who knows exactly what a human face or a landscape should look like. When you show the AI a blurry, broken photo, the teacher says, "Ah, that looks like a nose, but it's too blurry. Let me guess the rest based on what I know about noses."

The Problem: The "One-Size-Fits-All" Teacher

The paper points out a flaw in how we usually use these AI teachers. Traditionally, we train them with a fixed level of detail.

The "Simple" Teacher: This teacher only knows the basics. If you ask them to reconstruct a face from a tiny, blurry smudge, they might do okay. But if you give them a high-quality photo and ask them to fill in the details, they will look like a cartoon. They aren't detailed enough.
The "Complex" Teacher: This teacher knows every single pore and eyelash. If you give them a tiny, blurry smudge, they will try to invent fake pores and fake eyelashes to fill the gaps. They are overfitting—they are so confident in their high-level knowledge that they start hallucinating details that aren't actually there, mistaking noise for signal.

The authors realized that the right amount of detail depends on how much information you have. If you have very few puzzle pieces (low measurements), you need a simpler teacher. If you have many pieces, you need a complex one. But until now, you had to train a different teacher for every single scenario.

The Solution: The "Shape-Shifting" Teacher

This paper introduces Tunable Complexity. Imagine an AI teacher who can instantly change their personality.

Need to solve a puzzle with only 10% of the pieces? The teacher says, "Okay, I'll switch to Low Complexity Mode. I'll just guess the big shapes and ignore the tiny details."
Need to solve a puzzle with 90% of the pieces? The teacher says, "Great, I'll switch to High Complexity Mode. I'll add all the fine details."

They achieved this using a clever trick called Nested Dropout. Imagine a stack of building blocks. Usually, you build the whole tower. With this new method, the AI is trained to build the tower, but sometimes it's forced to stop halfway and see if the half-built tower still looks like a house. This forces the bottom blocks (the most important features) to carry the most weight, while the top blocks (the fine details) are optional.

The Results: Finding the "Goldilocks" Zone

The researchers tested this on various tasks like:

Compressed Sensing: Reconstructing an image from very few data points (like seeing a face through a keyhole).
Denoising: Removing static from an old TV signal.
Inpainting: Filling in missing parts of a photo (like removing a tourist from a vacation picture).

They found that there is always a "Goldilocks" zone.

If the complexity is too low, the image is blurry and missing details.
If the complexity is too high, the image looks weird and has fake artifacts (like a nose that looks like a potato).
The Tunable Model finds the perfect middle ground. It adapts its "brain size" to match the amount of information available.

A Real-World Analogy: The Detective

Think of a detective trying to identify a suspect from a grainy security camera photo.

Low Complexity: The detective says, "It's a human." (Too vague).
High Complexity: The detective says, "It's a man named Bob, wearing a red hat, with a scar on his left cheek." (Too specific; the scar might just be a shadow).
Tunable Complexity: The detective looks at the graininess of the photo and says, "It's a man, likely wearing a hat. I can't be sure about the scar, so I'll leave that out."

Why This Matters

This is a big deal because it means we don't need to train a thousand different AI models for a thousand different problems. We can train one single, flexible model that can handle anything from a blurry, low-data scenario to a high-definition, data-rich scenario just by turning a "complexity knob."

It's like having a Swiss Army knife instead of a drawer full of single-purpose tools. It makes solving these difficult image problems faster, smarter, and more adaptable to the real world, where data is often messy and incomplete.

1. Problem Statement

Inverse problems aim to reconstruct an unknown signal $x$ from noisy, incomplete measurements $y = A(x) + \eta$ , where $A$ is a forward operator (e.g., compression, blurring, phase loss). These problems are typically ill-posed, requiring strong priors to constrain the solution space.

The Limitation of Current Approaches:
Deep generative models (GANs, Normalizing Flows, Diffusion Models) have emerged as powerful priors. However, existing methods rely on fixed-complexity models.

Low complexity: May fail to capture fine details, leading to high representation error (underfitting).
High complexity: May overfit to noise or measurement artifacts, leading to poor generalization.
The Gap: There is no mechanism to dynamically adjust the model's representational capacity (latent dimensionality) based on the specific constraints of the inverse problem (e.g., the number of measurements or noise level) without retraining the entire model.

2. Methodology

The authors propose Tunable-Complexity Generative Priors, a framework where a single trained model can represent natural signals across a spectrum of latent dimensionalities $k$ . The core innovation is the use of Nested Dropout to enforce an ordered structure on the latent space.

A. Core Mechanism: Nested Dropout

Instead of training separate models for different latent dimensions, the authors train a single model where the latent vector $z$ is truncated at a random dimension $k$ during training.

Ordered Representation: The model learns that the first $k$ coordinates contain the most essential signal structure, while subsequent coordinates add finer details.
Truncation Operator: $z_{\downarrow k} = [z_1, \dots, z_k, 0, \dots, 0]$ .
Inference: At test time, the practitioner selects an optimal $k$ based on the measurement conditions (e.g., low measurements $\to$ lower $k$ ).

B. Application to Specific Architectures

The paper adapts this concept to three major classes of generative models:

Latent Diffusion Models (LDMs):
- Training: A two-stage process.
  1. VAE Backbone: Trained with a nested dropout objective ( $L_{drop}$ ) to ensure robust reconstruction across varying $k$ .
  2. Diffusion Head: Trained with a convex combination of the standard diffusion loss and a truncated-latent loss. This encourages the denoising network $\epsilon_\theta$ to effectively predict noise even when the input latent vector is truncated.
- Inversion: Uses a generic template (Algorithm 1/2) alternating between reverse diffusion steps and data-consistency updates (e.g., gradient projection), applying the truncation operator $(z)_{\downarrow k}$ at each step.
Normalizing Flows (NFs):
- Adopts an existing ordering method where dimensions are permuted. The flow is trained to model the density of the first $k$ dimensions effectively, allowing for Maximum A Posteriori (MAP) estimation with a truncated latent vector.
Variational Autoencoders (VAEs):
- Extends the adversarial VAE objective with a nested dropout regularization term, forcing the encoder to prioritize information in the lower dimensions.

3. Key Contributions

Empirical Discovery of Non-Monotonicity:
The authors demonstrate that reconstruction error does not decrease monotonically as latent dimensionality increases. Instead, there is an intermediate optimal complexity ( $k^*$ ) that minimizes error.
- Too low: Under-representation of signal details.
- Too high: Overfitting to noise/measurement artifacts.
- Optimal: Balances signal fidelity and noise suppression.
Novel Training Algorithm for LDMs:
They introduce the first nested dropout training algorithm specifically for Latent Diffusion Models, enabling a single model to serve as a prior for a wide range of inverse problems without retraining.
Theoretical Analysis for Linear Denoising:
For linear invertible generative models under additive Gaussian noise, the authors derive an explicit expression for the Mean Squared Error (MSE) as a function of the complexity parameter $k$ .
- Result: They prove that the optimal $k$ depends on the noise level $\sigma$ and the singular values of the generator. Specifically, as noise increases, the optimal complexity $k$ decreases. This provides a theoretical justification for why "less is more" in high-noise regimes.
Generalization:
The approach is validated across multiple architectures (LDM, NF, VAE), datasets (CelebA, MS COCO, FFHQ), and inverse problems (Compressed Sensing, Inpainting, Denoising, Phase Retrieval, Deblurring).

4. Experimental Results

Performance: Tunable priors consistently outperform fixed-complexity baselines (both low and high complexity) across all tested tasks.
- Example: In compressed sensing with 10% measurements, intermediate latent dimensions yielded significantly lower LPIPS (perceptual error) and higher PSNR than full-dimensional models.
Robustness: The method improves state-of-the-art inversion algorithms (e.g., DPS, PSLD, ReSample) when tunability is incorporated.
Visualization: Qualitative results show that intermediate complexity models recover sharper details and fewer artifacts compared to low-complexity (blurry) and high-complexity (noisy) models.
Measurement Ratio Dependency: Experiments (Figure 2 and Figure 11) confirm that the optimal $k$ shifts as the measurement ratio ( $m/n$ ) changes. Fewer measurements require lower complexity to prevent overfitting.

5. Significance and Impact

New Axis of Improvement: The paper identifies "tunability" as a complementary axis to algorithmic improvements in inverse problems. It allows practitioners to adapt a single pre-trained model to diverse scenarios without the computational cost of training multiple models.
Theoretical Insight: The derivation of the optimal $k$ in the linear setting bridges the gap between empirical observation and theoretical understanding, explaining the "Goldilocks" zone of model complexity.
Practical Utility: By enabling a single model to handle varying levels of data scarcity and noise, this approach is highly relevant for real-world applications where measurement conditions are dynamic (e.g., medical imaging with varying scan times or sensor quality).

In summary, this work challenges the paradigm of fixed-complexity generative priors, demonstrating that adaptive complexity is crucial for optimal reconstruction in inverse problems, and provides both the algorithmic tools and theoretical backing to implement it.

Latent Generative Models with Tunable Complexity for Compressed Sensing and other Inverse Problems

The Problem: The "One-Size-Fits-All" Teacher

The Solution: The "Shape-Shifting" Teacher

The Results: Finding the "Goldilocks" Zone

A Real-World Analogy: The Detective

Why This Matters

1. Problem Statement

2. Methodology

A. Core Mechanism: Nested Dropout

B. Application to Specific Architectures

3. Key Contributions

4. Experimental Results

5. Significance and Impact

More like this

DyMRL: Dynamic Multispace Representation Learning for Multimodal Event Forecasting in Knowledge Graph

How unconstrained machine-learning models learn physical symmetries

Experiential Reflective Learning for Self-Improving LLM Agents

Learning Mesh-Free Discrete Differential Operators with Self-Supervised Graph Neural Networks

Physics-Informed Neural Network Digital Twin for Dynamic Tray-Wise Modeling of Distillation Columns under Transient Operating Conditions