IDperturb: Enhancing Variation in Synthetic Face Generation via Angular Perturbation

Imagine you are trying to teach a robot to recognize your face. To do this, you need to show it thousands of pictures of you: smiling, frowning, looking left, looking right, in bright sunlight, and in the dark.

The problem? Privacy laws and ethical concerns mean we can't just grab millions of real photos of real people to train these robots anymore. So, scientists use AI to generate fake (synthetic) faces instead.

Here is the catch: Current AI face generators are like stuck-on-repeat records. If you ask them to generate "Person A," they make a picture of Person A. If you ask again, they make almost the exact same picture again. They are too perfect and too identical.

If you train a robot on these identical fake faces, the robot gets confused when it sees a real person who looks slightly different (maybe they have a different haircut or are squinting). The robot fails because it never learned to handle variation.

Enter IDPERTURB: The "Slight Twist" Strategy

The paper introduces a clever, simple trick called IDPERTURB. Think of it not as changing the person, but as shaking the camera slightly while keeping the subject the same.

Here is how it works, using a few analogies:

1. The "Identity Fingerprint" (The Embedding)

Every face, when analyzed by a computer, gets a unique mathematical "fingerprint" (called an embedding). Imagine this fingerprint is a point on a giant, invisible globe.

If you have a point for "You," and another point for "Your Twin," they are close together but not on top of each other.
Current AI generators usually pick one single point on this globe and say, "Generate a face for this exact point." The result? A boring, repetitive face.

2. The "Cone of Possibility" (Angular Perturbation)

IDPERTURB says, "Let's not pick just one point. Let's pick a cone-shaped area around that point."

Imagine your "You" point is the tip of an ice cream cone.
Instead of staring at the tip, IDPERTURB picks random spots inside the cone.
These spots are still very close to "You" (so it's still clearly you), but they are slightly different angles.

3. The Result: A Family of Variations

When the AI generates a face using a point from inside that cone, it creates a picture of You, but maybe:

You are tilting your head slightly.
You are squinting a tiny bit.
The lighting feels a little different.
Your expression is slightly more relaxed.

It's like taking a photo of yourself, then taking 50 more photos where you just shift your weight, blink differently, or turn your head an inch. You are still unmistakably you, but the photos are diverse enough to teach the robot how to recognize you in the real world.

Why is this a big deal?

The "Goldilocks" Zone:

Too little change: The robot gets bored and fails to recognize real people (The "Stuck Record" problem).
Too much change: The robot thinks the new photo is a different person entirely (The "Identity Crisis").
IDPERTURB: It finds the perfect middle ground. It creates enough variety to make the robot smart, but keeps the changes small enough that the robot knows it's still the same person.

The Analogy of the Art Class

Imagine an art teacher asking students to draw "A Cat."

Old Method: The teacher gives the students a single, perfect photo of a cat. Everyone draws the exact same cat. When the teacher shows a real cat with a missing ear, the students are confused.
IDPERTURB Method: The teacher gives the students a photo of a cat, but says, "Draw this cat, but imagine it's stretching, or sleeping, or looking at a bird." The students draw many different versions of the same cat. Now, when the teacher shows a real cat, the students recognize it immediately because they've seen many variations.

The Bottom Line

IDPERTURB is a simple, geometric trick that makes synthetic faces less robotic and more realistic without needing to rebuild the AI from scratch. By slightly "wiggling" the mathematical coordinates of a face before generating it, the researchers created training data that is diverse enough to make face-recognition systems much smarter, more robust, and better at handling the messy reality of human faces.

It's a win for privacy (no real photos needed) and a win for technology (smarter AI).

1. Problem Statement

Face Recognition (FR) systems rely heavily on large-scale, diverse, and annotated datasets. However, privacy concerns and legal regulations (e.g., GDPR) have restricted access to authentic biometric data, leading to the withdrawal of major datasets like MS-Celeb-1M and VGGFace2. Consequently, researchers have turned to synthetic data generated by Deep Generative Models (DGMs), particularly Identity-Conditional Diffusion Models (DMs).

While recent identity-conditional DMs can generate photorealistic and identity-consistent images, they suffer from a critical flaw: limited intra-class variation.

The Issue: When a fixed identity embedding is used as a conditioning vector, the generated images often lack the natural diversity (e.g., changes in pose, expression, age, lighting) required to train robust and generalizable FR models.
Existing Limitations: Current methods to increase diversity often rely on auxiliary labels, learned style modules, complex iterative sampling, or architectural modifications to the generative model, which can be computationally expensive or compromise identity fidelity.

2. Methodology: IDPERTURB

The authors propose IDPERTURB, a simple yet effective, geometry-driven sampling strategy that enhances diversity without modifying the underlying pre-trained Diffusion Model.

Core Concept

Instead of using a single, fixed identity embedding ( $v$ ) for all samples of a specific identity, IDPERTURB generates a set of perturbed identity embeddings ( $\tilde{v}$ ) within a constrained angular region of the unit hyper-sphere. These perturbed vectors serve as conditioning inputs for the pre-trained DM.

Technical Formulation

Geometric Constraint: The method assumes that identity embeddings lie on a unit hyper-sphere. To maintain identity coherence while introducing variation, the perturbation is constrained within a spherical cap defined by a cosine similarity lower bound ($lb$).
Angular Sampling Process:
- Target Angle: A target cosine similarity $s$ is sampled uniformly from $[lb, 1]$, corresponding to an angle $\theta = \cos^{-1}(s)$ .
- Orthogonal Projection: Random noise $n \sim \mathcal{N}(0, I)$ is sampled and projected onto the hyperplane orthogonal to the original embedding $v$ to create a unit vector $u$ .
- Perturbation Construction: The new embedding $\tilde{v}$ is constructed as a linear combination of the original vector and the orthogonal vector:
  $\tilde{v} = \cos(\theta) \cdot v + \sin(\theta) \cdot u$
- This ensures $\|\tilde{v}\| = 1$ and the angle between $v$ and $\tilde{v}$ is exactly $\theta$ .
Identity Overlap Avoidance: To prevent a perturbed vector for identity $i$ from becoming semantically closer to a different identity $j$ , the lower bound $lb$ is dynamically adjusted based on the angular distance between reference identities:
$lb \leftarrow \max\left(lb, \max_{j \neq i} \cos\left(\frac{\angle(v_i, v_j)}{2}\right)\right)$
Data Generation: For each synthetic identity, $K$ perturbed embeddings are generated. Each is fed into a pre-trained Latent Diffusion Model (LDM) with a unique noise seed to synthesize a diverse set of face images.

3. Key Contributions

Geometry-Driven Sampling: A novel approach to introduce intra-class diversity by perturbing identity embeddings in the embedding space using angular constraints, eliminating the need for auxiliary labels or model retraining.
Identity Consistency Preservation: The method maintains high identity fidelity by constraining perturbations within a spherical cap, ensuring the generated images remain semantically coherent with the target identity.
Plug-and-Play Compatibility: IDPERTURB operates purely in the embedding space and is compatible with any pre-trained identity-conditional Diffusion Model (e.g., IDiff-Face).
State-of-the-Art Performance: Demonstrated that training FR models on IDPERTURB-generated data yields superior performance compared to existing synthetic data generation methods and approaches using authentic data in certain constrained settings.

4. Experimental Results

The authors evaluated IDPERTURB using pre-trained IDiff-Face models trained on FFHQ and Casia-WebFace (C-WF) datasets.

Key Metrics & Findings:

Intra-Class Diversity: Lowering the $lb$ parameter (increasing the angular deviation) significantly increased diversity metrics:
- Age/Expression Entropy: Increased with lower $lb$.
- Head Pose Variation: Standard deviation of yaw/pitch/roll increased (e.g., up to 23.6° yaw deviation at $lb=0.5$).
- Perceptual Diversity (LPIPS): Monotonically increased as $lb$ decreased.
Identity Separability: Even with strong perturbations ($lb=0.4$), the Equal Error Rate (EER) remained competitive with authentic datasets, proving that identity consistency was not severely compromised.
FR Verification Performance:
- FFHQ Base: IDPERTURB achieved a peak average accuracy of 88.79% (at $lb=0.5$), outperforming the baseline (86.58%).
- C-WF Base: IDPERTURB achieved a peak average accuracy of 93.62% (at $lb=0.6$), significantly outperforming the baseline (91.25%).
- Comparison with SOTA: IDPERTURB outperformed other synthetic methods (GANs, other Diffusion approaches like ID3, DCFace, Arc2Face) across five standard benchmarks (LFW, AgeDB-30, CFP-FP, CALFW, CP-LFW) and the large-scale IJB-C benchmark.
- Gap to Real Data: The method narrowed the performance gap between synthetic and authentic (C-WF) training data, achieving 93.62% vs. 94.63% on average.

Ablation Study (CFG Strength):

The study found a trade-off between Classifier-Free Guidance (CFG) strength ( $\omega$ ) and diversity. Moderate guidance ( $\omega=1$ or $2$) yielded the best balance between identity adherence and sample diversity.

5. Significance

Privacy-Preserving FR Training: IDPERTURB provides a scalable, privacy-compliant solution for generating high-quality training data, addressing the shortage of authentic biometric datasets.
Efficiency: Unlike methods requiring retraining generative models or adding complex auxiliary networks, IDPERTURB is a lightweight post-processing step on embeddings, adding negligible computational overhead (approx. 0.01s per identity).
Generalizability: The geometric approach is model-agnostic, suggesting it can be applied to various pre-trained diffusion models to enhance their utility for downstream tasks like face recognition, biometric security, and surveillance.

In conclusion, IDPERTURB successfully demonstrates that geometric manipulation of identity embeddings is a powerful mechanism to induce necessary intra-class variation in synthetic data, leading to more robust and generalizable Face Recognition systems.