SphOR: A Representation Learning Perspective on Open-set Recognition for Identifying Unknown Classes in Deep Learning Models

Imagine you are a security guard at a very exclusive club. Your job is to check IDs. You have a list of 100 specific names (the "Known Classes") that are allowed in.

The Problem: The "Familiarity Trap"
In the old days, if a stranger walked up who didn't have an ID on your list, the guard would panic. They might look at the stranger's face, see a nose and eyes that look somewhat like a known member, and say, "Oh, you must be Bob!" and let them in. This is a mistake. In AI, this is called the "Familiarity Trap." The AI gets too confident about things it doesn't actually know because it's looking for general similarities (like "has a nose") rather than specific details.

Also, most AI guards are trained to force everyone into one of the 100 slots. If someone new comes, the AI squishes them into the closest slot, even if they don't fit.

The Solution: SpHOR (The "Smart Guard")
The paper introduces a new method called SpHOR. Instead of just training the guard to recognize faces, SpHOR changes how the guard organizes the mental map of the club. It uses three clever tricks to make sure the AI knows exactly who belongs and who doesn't.

Here is how SpHOR works, using simple analogies:

1. The "Orthogonal" Rule (Separating the Rooms)

Imagine the club has a huge ballroom. In the old system, the 100 VIPs were all crowded in the center, bumping into each other.
SpHOR's trick: It forces each VIP group into their own distinct, non-overlapping hallway.

The Analogy: Think of the hallways as being at perfect 90-degree angles to each other (like the corner of a room). If you are in the "Cat" hallway, you are physically far away from the "Dog" hallway.
Why it helps: If a stranger walks in, they won't accidentally end up in the "Cat" hallway just because they have fur. They will end up in the empty space between the hallways, which the guard immediately recognizes as "Unknown."

2. The "Spherical" Constraint (The Globe)

Usually, AI maps data on a flat sheet of paper (Euclidean space). On a flat sheet, you can keep drawing circles further and further out, and they never run out of room. This makes it hard to tell where the "known" area ends and the "unknown" area begins.
SpHOR's trick: It forces all the data onto the surface of a globe (a sphere).

The Analogy: Imagine all the VIPs are stickers stuck on a basketball. They can't go "off" the ball. They have to stay on the surface.
Why it helps: On a sphere, there is a limited amount of space. If you try to put a new sticker (an unknown class) on the ball, it has to squeeze in between the existing ones. If it doesn't fit neatly into a cluster, the guard sees it as an outsider. This creates a natural "boundary" for what is known.

3. The "Mixup" and "Smoothing" Training (The Practice Drills)

To teach the guard to be better, SpHOR doesn't just show them clear photos of cats and dogs.

Mixup: The guard is shown a photo that is 50% cat and 50% dog. The guard has to learn that this "half-cat-half-dog" thing doesn't belong to either group perfectly. It teaches the guard to be humble and admit, "This is weird, it's not a pure cat."
Label Smoothing: Instead of telling the guard, "This is 100% a Cat," the guard is told, "This is mostly a Cat, but maybe a tiny bit of something else."
Why it helps: This stops the guard from being overconfident. It teaches the AI that the world is messy. When a truly unknown stranger walks in, the AI is less likely to force them into a "Cat" box and more likely to say, "I don't know what this is."

The Result: A Better Security System

The paper tested this "SpHOR" guard against many other guards on difficult tests (like distinguishing between very similar bird species or car models).

The Old Guards: Often confused new birds with old ones, or got tricked by strangers who looked slightly familiar.
The SpHOR Guard: Because it organized its mental map into distinct, non-overlapping rooms on a sphere, and because it practiced with "mixed-up" examples, it was much better at saying, "I don't know this person," rather than guessing wrong.

In a nutshell:
SpHOR is a new way of teaching AI to recognize things. Instead of just memorizing faces, it organizes its memory into a structured, spherical map with separate rooms for each group. It also practices with confusing examples so it doesn't get overconfident. This makes it much better at spotting strangers (unknown classes) without making mistakes.

1. Problem Statement

Open-Set Recognition (OSR) aims to enable Deep Neural Networks (DNNs) to identify input data from classes that were not present during training, labeling them as "unknown" rather than misclassifying them into known categories.

The paper identifies two primary challenges in current OSR approaches:

The Familiarity Trap: Unknown classes often share semantic similarities or data distributions with known classes (fine-grained shifts). If a network learns features based on class-shared attributes (e.g., background, texture) rather than class-specific attributes, unknown samples map too close to known classes in the latent space, making them indistinguishable.
Inadequate Representation Learning: Most existing OSR methods train the feature extractor and classifier jointly. This often results in feature representations that are implicitly adapted for unknowns rather than explicitly structured to separate them. Furthermore, many methods rely on generic objectives (like standard supervised contrastive learning) not specifically designed for the unique geometry of open-set problems.
Limitations of Euclidean Space: Traditional OSR methods model features in unbounded Euclidean space, which increases "open-space risk" (the likelihood of misclassifying known samples as unknown).

2. Methodology: SpHOR

The authors propose SpHOR, a novel two-stage decoupled training framework that explicitly shapes the feature space via supervised representation learning before training a classifier.

Stage 1: Spherical Representation Learning

Instead of learning generic features, SpHOR constructs a feature space with specific geometric properties:

Spherical Constraint: Features are $L_2$ -normalized and projected onto a hypersphere. This allows modeling class distributions as von Mises-Fisher (vMF) distributions, which are the spherical analogues of Gaussians. This constrains the "open space" and reduces open-space risk.
Orthogonal Label Embeddings: To combat the "Familiarity Trap," the method enforces orthogonality among class-specific label embeddings ( $\mu_c$ ). This ensures that each class occupies a distinct linear subspace, preventing feature redundancy and encouraging the model to learn class-specific attributes rather than shared ones.
Loss Function ( $L_{vMFAL}$ ): The core loss is a modified von Mises-Fisher Alignment Loss. It aligns sample embeddings ( $z_i$ $z_{i}$ ) with their corresponding label embeddings ( $\mu_c$ $μ_{c}$ ) while promoting Uniformity (spreading embeddings evenly) and Alignment (pulling samples toward their class center).
- Handling Ambiguity: The loss incorporates Label Smoothing (LS) and Mixup. When a sample is ambiguous (e.g., a Mixup sample), the loss pulls it toward the mean of all label embeddings rather than a single class center, effectively pushing ambiguous/unknown-like samples away from class centers.
Orthogonality Regularizer ( $R_{Ortho}$ ): An additional term forces the label embeddings themselves to be orthogonal and uniformly distributed, preventing "embedding collapse" where different classes become indistinguishable.

Stage 2: Classifier Training

The projection network and label embeddings are discarded.
A standard classifier (MLP) is trained on the frozen, learned feature representations using standard cross-entropy loss. This decoupling allows the feature space to be optimized specifically for OSR geometry before the decision boundaries are set.

Post-Processing (Scoring)

The system uses a scoring rule $S(\cdot)$ to determine if a test sample is known or unknown. The paper evaluates various rules, including Classifier Scores (MaxLogit), Feature Scores (KNN), and Hybrid Scores (NNGuide).

3. Key Contributions

Novel Two-Stage Decoupled Framework: SpHOR separates representation learning from classifier training, explicitly designing the feature space for OSR rather than relying on incidental adaptation.
Three Key Innovations:
- Orthogonal Label Embeddings: Enforces discriminative, class-specific features to prevent the familiarity trap.
- Spherical vMF Modeling: Constrains features to a hypersphere to model open-space risk mathematically.
- Integration of Mixup & Label Smoothing: Directly applied during representation learning to model ambiguous/unknown spaces and improve robustness.
New Evaluation Metrics:
- Angular Separability (AS): Measures the geometric proximity of unknown samples to known class centers (lower is better).
- Norm Separability (NS): Measures the ability to distinguish unknowns based on feature magnitude (norm), leveraging the fact that unknown samples often have different norm distributions.
Theoretical Analysis: The paper provides proofs showing how the proposed loss induces Alignment and Uniformity and analytically demonstrates how orthogonality improves dispersion.

4. Experimental Results

The method was evaluated on Semantic Shift Benchmark (SSB) (fine-grained datasets: CUB, Stanford Cars, FGVC-Aircraft) and Legacy CNN-32 Benchmarks (coarse-grained: CIFAR, SVHN, Tiny-ImageNet).

State-of-the-Art Performance: SpHOR achieved top results across both fine-grained and coarse-grained benchmarks.
- On the Semantic Shift Benchmark, it improved OSCR (Open Set Classification Rate) by up to 5.1% and AUROC by 5.2% compared to strong baselines like MLS and SupCon.
- On Legacy Benchmarks, it outperformed methods like ConOSR and ARPL, achieving average AUROC improvements of ~0.8% to 1%.
Robustness:
- Scoring Rule Independence: SpHOR showed significantly lower sensitivity to the choice of scoring rule compared to baselines.
- No Pretraining Required: Unlike many baselines that suffer massive performance drops when trained from scratch (without ImageNet pretraining), SpHOR maintained competitive performance, proving the stability of spherical representations.
- Small-Batch Efficiency: SpHOR has linear computational complexity $O(B \cdot C)$ compared to the quadratic $O(B^2)$ of contrastive learning (SupCon), making it robust even with small batch sizes.

5. Significance

This paper shifts the paradigm of Open-Set Recognition from "classifier-level adjustments" to "representation-level design." By explicitly structuring the feature space using spherical geometry and orthogonality constraints, SpHOR effectively addresses the Familiarity Trap, a major hurdle in fine-grained OSR.

The introduction of Angular and Norm Separability provides new tools for diagnosing why an OSR model fails, moving beyond simple accuracy metrics. The method's ability to perform well without pretraining and with small batches makes it highly practical for real-world applications where data is limited or computational resources are constrained.

SphOR: A Representation Learning Perspective on Open-set Recognition for Identifying Unknown Classes in Deep Learning Models

1. The "Orthogonal" Rule (Separating the Rooms)

2. The "Spherical" Constraint (The Globe)

3. The "Mixup" and "Smoothing" Training (The Practice Drills)

The Result: A Better Security System

1. Problem Statement

2. Methodology: SpHOR

Stage 1: Spherical Representation Learning

Stage 2: Classifier Training

Post-Processing (Scoring)

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Conversational Successes and Breakdowns in Everyday Smart Glasses Use

EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents

GVGS: Gaussian Visibility-Aware Multi-View Geometry for Accurate Surface Reconstruction

PyEncode: An Open-Source Library for Structured Quantum State Preparation

DOne: Decoupling Structure and Rendering for High-Fidelity Design-to-Code Generation