Implementation of Quantum Implicit Neural Representation in Deterministic and Probabilistic Autoencoders for Image Reconstruction/Generation Tasks

Imagine you are trying to teach a robot to draw pictures. You want it to do two things:

Reconstruct: Look at a picture of a cat, understand what makes it a cat, and draw it again from memory.
Generate: Look at your understanding of "cat-ness" and draw a new cat that it has never seen before.

This paper is about teaching a robot how to do this using a special mix of classical computer brains and quantum magic. The author, Saadet Muzehher Eren, built a new type of "artist" called a QINR-AE/VAE.

Here is the breakdown using simple analogies:

1. The Problem: The "Boring Artist"

In the world of AI, there are different types of artists.

The Classical Artist (Autoencoder): Good at copying, but sometimes forgets the details.
The Quantum GAN (Generative Adversarial Network): A very ambitious quantum artist. It tries to create new art by playing a game against a critic. But, it often suffers from "Mode Collapse."
- The Analogy: Imagine a jazz musician who gets stuck playing the exact same note over and over because they are afraid to try something new. The quantum GANs in this paper were like that musician—they kept drawing the same blurry, average-looking digit or letter, lacking variety.

2. The Solution: The "Quantum Implicit Neural Representation" (QINR)

The author introduced a new tool called QINR. Think of this as a special quantum paintbrush.

How it works: Instead of just memorizing pixels (dots of color), this brush learns the mathematical rhythm of the image. It treats an image like a continuous song rather than a collection of static blocks.
The Secret Sauce: The author added "learnable angle-scaling."
- The Analogy: Imagine you are tuning a guitar. Usually, you just turn the pegs until it sounds right. But this new method lets the robot learn exactly how hard to turn the pegs to get the perfect pitch. This prevents the robot from getting stuck in a bad tune (optimization challenge).

3. The Two Artists: The AE and the VAE

The paper tests this new quantum paintbrush in two different "studios":

A. The QINR-AE (The Copyist)

Goal: Take an image, shrink it down to a tiny summary (a "latent vector"), and expand it back out.
Result: It's like a photocopier that understands the essence of the image. When it reconstructs a picture of a "7," the lines are sharp, and the corners are crisp. It doesn't just guess; it remembers the geometry perfectly.

B. The QINR-VAE (The Inventor)

Goal: This is the creative one. It takes the summary and adds a little bit of "imagination" (random noise) to create new images.
The Big Win: This is where the paper shines. The author compared this new artist to the old quantum GANs.
- The Old GANs: Drew the same blurry "7" every time.
- The New QINR-VAE: Drew a "7" that was crossed, a "7" that was flat, a "7" that was tilted.
- The Metaphor: If the old models were a stamp that printed the same coin over and over, the new QINR-VAE is a mint that can forge unique coins with different scratches and angles, all while looking like real currency.

4. The Training Ground

The author tested these models on three famous "art classes":

MNIST: Handwritten numbers (0-9).
E-MNIST: Handwritten letters.
Fashion MNIST: Pictures of clothes (shirts, shoes, bags).

They used a very small amount of data (only 500 examples per class) to see if the robot could learn quickly. Even with this tiny dataset, the new quantum models produced clearer, sharper, and more diverse images than the previous quantum models.

5. Why This Matters

Stability: The new model didn't get "stuck" or confused during training. It learned steadily.
Efficiency: It achieved great results with fewer "quantum parameters" (fewer moving parts) than its competitors.
No "Mode Collapse": It successfully avoided the problem of drawing the same thing repeatedly.

The Bottom Line

This paper is like showing a new type of quantum 3D printer. Previous quantum printers could only make one specific shape, and it looked a bit fuzzy. This new printer, using the "QINR" blueprint, can print a wide variety of shapes that are sharp, detailed, and look very real, even when it only has a few instructions to work with.

It proves that by mixing classical computer brains (for the heavy lifting) with quantum circuits (for the creative, high-frequency details), we can build better AI artists for the future.

Here is a detailed technical summary of the paper "Implementation of Quantum Implicit Neural Representation in Deterministic and Probabilistic Autoencoders for Image Reconstruction/Generation Tasks."

1. Problem Statement

The paper addresses two primary challenges in the intersection of Quantum Machine Learning (QML) and image processing:

Mode Collapse in Quantum Generative Models: Existing quantum generative models, such as Quantum Generative Adversarial Networks (QGANs), often suffer from "mode collapse," where the generator produces limited diversity (e.g., only a few similar samples) rather than capturing the full distribution of the data.
Efficiency and Expressivity: Classical autoencoders (AE) and variational autoencoders (VAE) are effective but may lack the ability to model high-frequency, periodic, and complex features efficiently with a constrained number of parameters. The authors aim to demonstrate that Quantum Implicit Neural Representations (QINR) can serve as a superior decoder to transform latent space information into rich image features, offering better stability and diversity than QGANs.

2. Methodology

The authors propose a Quantum-Classical Hybrid Architecture consisting of a classical Convolutional Neural Network (CNN) encoder and a QINR-based decoder.

A. Architecture Design

Encoder (Classical): A standard CNN with convolutional layers, batch normalization, and Leaky ReLU activations. It compresses input images (28×28 pixels) into a latent vector $z$ $z$ .
- For QINR-AE: The encoder outputs a deterministic latent vector.
- For QINR-VAE: The encoder outputs the mean ( $\mu$ ) and log-variance ( $\log \sigma^2$ ) of a Gaussian distribution. The latent vector is sampled using the reparameterization trick ( $z = \mu + \epsilon \odot \sigma$ ).
Decoder (Hybrid QINR): This is the core innovation.
- Classical Pre-processing: The latent vector is expanded via linear layers and batch normalization to a higher-dimensional space.
- Learnable Angle Scaling: A linear layer maps features to qubit rotation angles. Crucially, the authors introduce learnable scaling factors ( $\lambda$ ) for the data reuploading angles. This allows the circuit to adapt input scaling dynamically, addressing optimization challenges and enhancing expressivity.
- Quantum Circuit: The circuit consists of $L$ $L$ parameterized layers and $L-1$ $L - 1$ encoding layers using 6 qubits.
  - Encoding: Data is reuploaded via $R_Z$ rotations.
  - Parameter Layers: Contain Euler rotations ( $Rot(\alpha, \beta, \gamma)$ ) and entangling Controlled-Z (CZ) gates.
- Readout: The output is obtained via expectation values of Pauli-Z operators (and other observables in appendices), which are fed into final linear layers to reconstruct the image logits.

B. Training and Optimization

Loss Functions:
- Reconstruction Loss: Binary Cross-Entropy with Logits (BCEWithLogits) is used for both AE and VAE to ensure pixel-level accuracy.
- Regularization (VAE only): Kullback–Leibler (KL) divergence is used to regularize the latent space. To prevent posterior collapse, the authors employ $\beta$ -warmup (gradually increasing the KL weight) and Capacity Control (constraining the KL divergence to a target capacity $C(t)$ ).
Optimization: The models are trained using Adam with separate learning rates for classical ( $\eta_{cls}$ ) and quantum ( $\eta_{q}$ ) parameters. Gradient clipping is applied to ensure stability.
Datasets: Experiments were conducted on MNIST, E-MNIST, and Fashion MNIST (using 500 samples per class). Additional experiments were performed on CelebA (Appendix A).

3. Key Contributions

QINR-VAE/AE Framework: The first integration of QINR into both deterministic (AE) and probabilistic (VAE) autoencoder architectures for image tasks.
Mitigation of Mode Collapse: The study demonstrates that the QINR-VAE is more robust against mode collapse compared to QGAN variants (PQWGAN, Quantum AnoGAN, QINR-QGAN), producing a wider variety of distinct images.
Learnable Angle Scaling: The introduction of trainable scaling factors in the data reuploading process to stabilize training and improve the circuit's ability to approximate complex functions (Fourier series-like).
Comprehensive Evaluation: A rigorous comparison using both qualitative (visual inspection) and quantitative metrics (FID, SSIM, PSNR, Cosine Similarity) across multiple datasets.

4. Results

The models were simulated on noiseless 6-qubit hardware using PennyLane and PyTorch.

Qualitative Analysis:
- QINR-VAE: Generated images were significantly sharper, with clearer boundaries and higher intra-class diversity compared to QGANs. While QGANs tended to produce blurry, averaged images (mode collapse), the QINR-VAE captured distinct writing styles (e.g., crossed vs. uncrossed '7').
- QINR-AE: Successfully reconstructed images with high clarity and preserved structural details (corners, outlines) even with limited data.
Quantitative Analysis:
- FID (Fréchet Inception Distance): The QINR-VAE achieved the lowest FID scores (indicating better distribution alignment) across all datasets compared to PQWGAN, Quantum AnoGAN, and QINR-QGAN. For example, on MNIST, QINR-VAE FID ranged from 110–144, whereas PQWGAN ranged from 250–360.
- SSIM/PSNR/Cosine Similarity: The QINR-AE and QINR-VAE (reconstruction) showed superior structural similarity and pixel accuracy compared to generative baselines.
- Stability: Loss curves for both reconstruction and total loss (including KL) showed stable convergence without the oscillations often seen in GAN training.
Appendix Findings:
- CelebA: With limited data, faces were somewhat uniform/faded, but reconstruction (AE) was sharper than generation (VAE).
- Readouts: Using multi-basis readouts ( $\langle X \rangle, \langle Y \rangle, \langle Z \rangle, \langle ZZ \rangle$ ) significantly improved image quality and metric scores compared to single-basis readouts.
- QINR vs. Classical Decoder: While classical decoders showed slightly higher FID (more diversity), the QINR decoder produced visually more coherent and continuous images.

5. Significance and Conclusion

The paper establishes that Quantum Implicit Neural Representations are a viable and powerful alternative to classical decoders in autoencoder frameworks.

Robustness: The QINR-VAE offers a more stable training dynamic than QGANs, effectively solving the mode collapse problem in small-scale quantum simulations.
Efficiency: The model achieves high-quality reconstruction and generation with a relatively small number of quantum parameters (120 parameters for 6 qubits) compared to other quantum models (e.g., PQWGAN with 2000+ parameters).
Future Outlook: The authors conclude that while current results are based on noiseless simulations, the QINR approach holds promise for future quantum hardware. They suggest that increasing data volume and exploring noise-resilient architectures are key next steps for real-world deployment.

In summary, this work provides a strong proof-of-concept that hybrid quantum-classical autoencoders utilizing implicit neural representations can outperform existing quantum generative models in terms of image fidelity, diversity, and training stability.