On the Possible Detectability of Image-in-Image Steganography

Imagine you have a beautiful, innocent-looking photograph of a sunny beach (the Cover). Now, imagine you want to hide a secret, high-resolution photo of a cat (the Payload) inside that beach photo without anyone noticing.

This is the world of Image-in-Image Steganography. It's like trying to hide a whole second movie inside a single frame of the first movie. In recent years, scientists have built "magic boxes" (using AI called Invertible Neural Networks) that can do this. They claim these boxes are so good that the beach photo looks exactly the same, even though it's secretly carrying the cat photo.

This paper asks a simple but crucial question: "Is this magic actually magic, or can we see the trick?"

Here is the breakdown of what the authors discovered, using some everyday analogies.

1. The "Magic" Trick (The Setup)

The AI models used to hide these images work like a sophisticated blender. They take the "Beach" and the "Cat" and mix them together to create a new "Stego" image.

The Claim: The creators of these tools say, "Don't worry, the mix is perfect. You can't tell the difference."
The Reality: The authors found that the mixing process isn't perfect. It leaves a specific "smell" or "fingerprint" behind.

2. The Detective's Toolkit (The Method)

The authors didn't just guess; they built a detective kit to find the hidden cat. Their method is like a three-step process:

Step 1: The Wavelet Breakdown (The Prism)
Imagine shining a prism through the beach photo. Instead of seeing just the picture, the prism splits it into different layers of detail: the big shapes (low frequency) and the tiny textures (high frequency). The authors realized the "magic box" mostly hides the cat in the tiny textures of the beach photo.
Step 2: The "Noise" Filter (PCA)
When you look at all those layers, most of them are just the beach. But a few layers contain the "weirdness" added by the secret cat. The authors used a filter (called PCA) to ignore the boring beach layers and zoom in only on the weird, suspicious layers. It's like ignoring the background chatter at a party to focus on the one person whispering a secret.
Step 3: The "Unmixing" Machine (ICA)
This is the star of the show. They used a technique called Independent Component Analysis (ICA).
- The Analogy: Imagine you are at a cocktail party with two people talking at once. It's hard to hear either one clearly. But if you have a special microphone (ICA), you can isolate Person A's voice from Person B's voice, even though they were speaking over each other.
- The authors used this to "unmix" the beach and the cat. They found that the AI's mixing process was so predictable that their "microphone" could easily separate the two images again.

3. The "Fingerprint" (The Features)

Once they isolated the "Cat Voice" and the "Beach Voice," they didn't need to look at the whole picture. They just looked at the statistics of the voices.

They measured four simple things about the sound waves: How loud is it? Is it balanced? Is it lopsided? Is it spiky?
They found that the "Cat Voice" (the hidden image) had a very different statistical shape than a normal "Beach Voice." It was like finding that a fake diamond has a slightly different sparkle than a real one.

4. The Results (The Verdict)

The authors tested their method on several popular "magic boxes."

The Good News for Detectives: Their simple method could spot the hidden images with 80% to 85% accuracy.
The Bad News for Hiders: When they used the "heavy artillery" (standard, complex AI detectors used by security experts), the accuracy jumped to over 99%.

The Big Reveal:
The paper concludes that these "Image-in-Image" hiding schemes are highly vulnerable.

No Secret Key: Most of these systems don't use a password or a secret key. If you know how the machine works, you can just run it backward to get the secret image out.
The Mixing is Obvious: The way the AI mixes the images leaves a clear mathematical trail that is easy to follow.

The Takeaway

Think of these steganography tools as a child trying to hide a toy inside a pillow by sewing it in. The child thinks, "No one will see it!" But to an adult (the detective), the lump in the pillow is obvious, and the stitching pattern is a dead giveaway.

The authors are saying: "Stop thinking these AI tools are unbreakable. They leave clear fingerprints, and we can easily find the hidden secrets." They hope this study will push developers to build better, truly secure hiding spots in the future—perhaps by adding a real "lock and key" system.

Here is a detailed technical summary of the paper "On the Possible Detectability of Image-in-Image Steganography" by Antoine Mallet and Patrick Bas.

1. Problem Statement

The paper addresses the security and detectability of Image-in-Image Steganography, a paradigm where a secret image (Payload) of comparable size to the Cover image is embedded within it. Unlike traditional steganography which hides small bitstreams, these schemes (often based on Invertible Neural Networks or INNs like HiNet, PRIS, and DeepMIH) aim for high embedding rates.

The core problem investigated is whether these high-capacity schemes are truly secure against steganalysis. While proponents claim high security, the authors argue that the mixing process inherent in these models creates statistical anomalies that can be exploited. Specifically, they challenge the notion that these schemes are undetectable, noting that many lack secret keys and rely on deterministic transformations that may be reversible or analyzable.

2. Methodology

The authors propose a novel, interpretable steganalysis framework that combines signal processing and statistical feature extraction. The methodology consists of four main stages:

A. Signal Decomposition (DWT)

Instead of working on raw pixel values, the method applies a Discrete Wavelet Transform (DWT) to the input image. This decomposes the image into sub-bands (LL, LH, HL, HH) for each color channel. The authors hypothesize that the embedding process alters specific frequency sub-bands differently, creating a "mixing" of the Cover and Payload signals.

B. Dimensionality Reduction (PCA)

To isolate the embedding artifacts from the dominant image content, Principal Component Analysis (PCA) is performed on the 12 wavelet sub-bands (4 per channel).

Rationale: The principal components capture the dominant structure (the Cover image), while the minor components (low variance) are hypothesized to contain the modifications introduced by the steganographic embedding (the Payload).
Selection: Through grid search, the authors identified that using specific minor components (e.g., PC 9 and PC 11) yields the best separation of the hidden signal.

C. Blind Source Separation (ICA)

Independent Component Analysis (ICA), specifically the FastICA algorithm, is applied to the selected PCA components.

Goal: To separate the mixed signals (Cover + Payload) into two statistically independent sources ( $c_1$ and $c_2$ ).
Observation: The authors demonstrate that these extracted components semantically resemble the original Cover and Payload images, confirming that the embedding process is a linear (or near-linear) mixing process that ICA can reverse.

D. Feature Extraction and Classification

Once the independent components are extracted, the method computes the first four statistical moments of their coefficient distributions:

Mean ( $\mu$ )
Standard Deviation ( $\sigma$ )
Skewness ( $\gamma$ )
Kurtosis ( $\kappa$ )

These eight features (two components $\times$ four moments) form a compact, interpretable feature vector. A Gaussian Support Vector Machine (SVM) is then trained on these vectors to classify images as either "Cover" or "Stego."

3. Key Contributions

Vulnerability Analysis of INNs: The paper provides the first comprehensive analysis showing that INN-based image-in-image steganography creates a mixing process easily identifiable by ICA.
Interpretable Steganalysis: Unlike "black-box" deep learning steganalyzers, the proposed method is highly interpretable. It explicitly links detection to the statistical properties of the separated independent components.
Keyless Extraction Insight: The authors highlight a critical security flaw: most existing INN schemes do not use secret keys. The architecture and weights alone determine the embedding/extraction, and the "noise" input required for extraction is often irrelevant (e.g., setting it to zero yields good reconstruction), making the schemes inherently insecure.
Hybrid Evaluation: The study evaluates these schemes against both the proposed specialized method and classical steganalysis (SRM + SVM), proving their vulnerability across different attack vectors.

4. Experimental Results

The authors tested their method on five different image-in-image steganography schemes: HiNet, PRIS, DeepMIH, Baluja, and Weng et al.

Proposed Method Performance:
- The method achieved high detection accuracy, particularly on INN-based schemes.
- PRIS: 84.62% accuracy.
- HiNet: 80.31% accuracy.
- DeepMIH: 82.58% accuracy.
- Non-INN schemes (Baluja, Weng) showed lower detectability (61.83% and 74.96%), likely because they operate directly on pixels rather than wavelet coefficients, making the ICA assumption less effective.
Classical Steganalysis (SRM + SVM):
- When tested against the Spatial Rich Model (SRM) combined with an SVM, the results were even more damning for the steganography schemes.
- HiNet, PRIS, DeepMIH, and Weng: All achieved detection accuracies >99%.
- Baluja: Achieved 80.06% accuracy.

5. Significance and Conclusion

The paper concludes that current image-in-image steganography schemes, particularly those based on INNs, are highly vulnerable.

Theoretical Implication: The embedding process acts as a mixing operation that fails to hide the statistical independence of the source signals. The Payload information is not sufficiently obfuscated and remains detectable via linear separation techniques (ICA).
Practical Implication: These schemes are not secure for covert communication. The lack of a secret key and the deterministic nature of the transformation allow adversaries to extract hidden content or detect its presence with high confidence.
Future Directions: The authors suggest that future secure schemes must incorporate secret keys and explicitly optimize for detectability metrics (minimizing statistical traces) rather than just focusing on reconstruction quality (PSNR).

In summary, the study demonstrates that despite the high capacity and visual quality of modern image-in-image steganography, they fail fundamental security tests, making them easily detectable by both specialized ICA-based methods and classical steganalyzers.