Permutation-Invariant Representation Learning for Robust and Privacy-Preserving Feature Selection

Imagine you are trying to bake the perfect cake, but you have a pantry full of 1,000 ingredients. Some are essential (flour, sugar, eggs), some are useless (a bottle of motor oil), and some are redundant (three different types of vanilla extract). Your goal is to pick the best combination of ingredients to make the cake taste amazing, without wasting time testing every single possibility.

This is exactly what Feature Selection does in Artificial Intelligence. It tries to find the most important "ingredients" (data points) to make a model smart, while ignoring the junk.

However, there are two big problems with current methods:

The Order Problem: Imagine if your recipe said, "Add sugar, then flour, then eggs" is different from "Add eggs, then flour, then sugar." In reality, the order doesn't matter for the cake, but old computer methods treat them as totally different recipes. This confuses the AI.
The Privacy Problem: Imagine you want to bake a cake with 100 different chefs, but none of them want to share their secret family recipes or even show you their ingredients because of privacy laws. They just want to help you find the best mix without revealing their secrets.

This paper introduces a new system called FedCAPS (and its simpler, centralized cousin CAPS) to solve these problems. Here is how it works, using simple analogies:

1. The "Magic Recipe Book" (Permutation-Invariant Embedding)

Old AI methods were like a librarian who gets confused if you swap the order of books on a shelf. If you have a list of ingredients [Flour, Sugar, Eggs], the old AI thinks it's different from [Eggs, Sugar, Flour].

The authors built a Magic Recipe Book (an Encoder-Decoder) that understands that order doesn't matter.

The Analogy: Think of a smoothie. Whether you throw the strawberries, bananas, and milk into the blender in that order, or milk, strawberries, and bananas, the result is the same smoothie.
How it works: The system uses a special "attention mechanism" (like a smart spotlight) that looks at all the ingredients at once and says, "Ah, this is a strawberry-banana-milk mix," regardless of the order you listed them. This stops the AI from getting confused by silly ordering issues.

2. The "Smart Explorer" (Policy-Guided Search)

Once the AI understands the ingredients, it needs to find the best mix. Old methods tried to walk up a hill to find the highest peak, assuming the hill was smooth and round (convex). But the real world is full of cliffs, caves, and fake peaks (non-convex).

The authors sent in a Smart Explorer (a Reinforcement Learning Agent).

The Analogy: Imagine you are in a dark, foggy maze looking for the exit.
- Old Method: You just walk straight up, assuming the ground is flat. You might get stuck in a small dip.
- New Method (FedCAPS): You have a compass and a map. You start with a few "seeds" (good guesses based on past success) and then explore the maze intelligently. You try a path, see if it gets you closer to the exit (better cake), and if not, you backtrack and try a new direction.
The Result: This explorer doesn't get stuck in small traps; it finds the global best solution (the perfect cake) even in a messy, complex maze.

3. The "Secret Club" (Federated Learning & Privacy)

Now, imagine you don't have one giant pantry. Instead, you have 100 different kitchens (clients), and each has its own unique set of ingredients. But by law, no one can share their actual ingredients with you.

The Problem: If you just ask everyone to send you their "best ingredient list," the kitchens with huge pantries might drown out the small kitchens. Also, you can't see their raw data.
The Solution (FedCAPS):
1. No Raw Data Sharing: Each kitchen sends only a "scorecard" (e.g., "My mix of A, B, and C got a 9/10 taste score"). They never send the actual ingredients.
2. The Weighted Vote: The system knows that a kitchen with 1,000 samples has a more reliable score than a kitchen with only 5 samples. So, it gives the big kitchens a louder voice (more weight) in the final decision, but still listens to the small ones so they aren't ignored.
3. The Unified Master Recipe: The central server takes all these scorecards, combines them into one "Global Recipe Book," and uses the Smart Explorer to find the ultimate ingredient mix that works for everyone.

Why is this a big deal?

It's Fairer: It respects privacy (kitchens don't share secrets) and handles different data sizes fairly.
It's Smarter: It stops the AI from being confused by the order of data and helps it navigate complex, messy real-world problems.
It's Efficient: It finds the best ingredients faster and with less computing power than previous methods.

In short: The authors built a system that acts like a super-smart, privacy-respecting master chef. It knows that the order of ingredients doesn't matter, it can explore a messy kitchen without getting lost, and it can collaborate with 100 other secret chefs to find the perfect recipe without ever seeing their actual ingredients.

1. Problem Statement

The paper addresses two critical challenges in automated feature selection (FS):

Limitations of Existing Generative Methods: Current generative AI approaches for FS often suffer from permutation bias (treating feature order as significant when it is not) and rely on convexity assumptions in embedding spaces. These assumptions hinder effective search in non-convex spaces, leading to suboptimal feature subsets.
Challenges in Distributed/Federated Settings: In real-world scenarios (e.g., healthcare, finance), data is distributed across clients with strict privacy regulations, preventing raw data sharing. Furthermore, local data is often heterogeneous and imbalanced (Non-IID). Existing Federated Learning (FL) methods focus on aggregating model parameters rather than feature selection knowledge, failing to construct a unified feature embedding space that generalizes across diverse clients.

The goal is to develop a framework that learns a unified, permutation-invariant representation of feature subsets and identifies an optimal subset across decentralized clients without exposing sensitive raw data.

2. Methodology

The authors propose a two-stage framework: a centralized model (CAPS) and its federated extension (FedCAPS).

A. Centralized Framework: CAPS

CAPS (Continuous optimization for feAture selection by integrating Permutation-invariant embeddings with a policy-guided Search strategy) consists of two main components:

Permutation-Invariant Embedding Learning:
- Encoder-Decoder Architecture: Uses an encoder-decoder to map discrete feature subset indices into a continuous embedding space and reconstruct them.
- Permutation Invariance: To ensure the embedding is independent of feature order, the encoder utilizes Self-Attention mechanisms (specifically Induced Set Attention Blocks - ISAB). This symmetrically computes attention scores across all input features.
- Efficiency: To reduce the quadratic complexity $O(N^2)$ of pairwise attention, the model employs Inducing Points (learnable intermediate representations), reducing complexity to $O(NM)$.
- Decoder: Uses Pooling by Multihead Attention (PMA) with learnable seed vectors to aggregate information and reconstruct the feature subset.
- Training: Optimized via negative log-likelihood reconstruction loss.
Policy-Guided Multi-Objective Search:
- Search Strategy: Uses Proximal Policy Optimization (PPO), a Reinforcement Learning (RL) algorithm, to explore the learned embedding space.
- Mechanism: The agent starts from "search seeds" (top-K feature subsets based on performance) and iteratively optimizes embeddings to maximize downstream task performance while minimizing subset length.
- Advantage: Unlike gradient-based methods, PPO does not assume convexity, allowing it to escape local optima in complex, non-convex embedding spaces.

B. Federated Framework: FedCAPS

FedCAPS extends CAPS to a federated setting with two key innovations:

Privacy-Preserving Knowledge Aggregation:
- Instead of sharing raw data or model weights, clients transmit feature selection records (feature indices and their local performance scores) to the central server.
- The server aggregates these records into a unified global embedding space using the same permutation-invariant encoder-decoder. This ensures raw data remains local and private.
Sample-Aware Weighted Aggregation:
- To address data imbalance and heterogeneity, the framework assigns weights to clients based on their dataset size ( $W_c \propto |D_c|$ ).
- During the RL search, the reward signal is a weighted average of performance across all clients. This prevents clients with small, noisy datasets from dominating the global optimization, ensuring the selected features generalize well.

3. Key Contributions

Permutation-Invariant Embedding: Introduced a novel encoder-decoder architecture using Induced Set Attention (ISAB) that guarantees identical embeddings for any permutation of a feature subset, eliminating permutation bias.
Non-Convex Search via RL: Replaced gradient-based search with a policy-guided RL agent (PPO) to navigate non-convex embedding spaces without relying on convexity assumptions.
Federated Feature Selection (FedCAPS): Proposed the first framework to aggregate feature selection knowledge (rather than model parameters) in a federated setting, preserving privacy by sharing only feature indices and performance scores.
Robustness to Heterogeneity: Developed a sample-aware weighting strategy to mitigate distributional bias caused by imbalanced client data sizes.
Efficiency: Reduced computational complexity of attention mechanisms via inducing points, making the approach scalable for high-dimensional data.

4. Experimental Results

The authors evaluated CAPS and FedCAPS on 14 diverse datasets (classification, multi-classification, and regression) from UCI, OpenML, and Kaggle.

Centralized Performance (CAPS): Outperformed 12 baseline methods (including Filter, Wrapper, Embedded, and other Generative methods like GAINS and MEL) across all datasets. CAPS achieved the highest F1-scores and Micro-F1 metrics.
Federated Performance (FedCAPS): Outperformed 4 standard FL baselines (FedAvg, FedNTD, FedProx, MOON). FedCAPS achieved the best performance on most datasets, demonstrating superior generalization in distributed scenarios.
Ablation Studies:
- Removing permutation invariance (using sequential encoders) led to performance drops due to bias.
- Replacing RL with Genetic Algorithms resulted in suboptimal convergence.
- Using random search seeds instead of top-K seeds increased variance and reduced performance.
Robustness: The selected feature subsets maintained high performance across different downstream models (Random Forest, XGBoost, SVM, KNN, Decision Tree).
Feature Efficiency: The method successfully reduced the feature subset size significantly (often <50% of original features) while maintaining or improving predictive accuracy.
Visualization: T-SNE visualizations confirmed that permuted feature subsets cluster tightly around their original embeddings, validating the permutation-invariant property.

5. Significance

This work represents a significant advancement in automated machine learning (AutoML) and privacy-preserving data science:

Theoretical: It challenges the assumption that feature order matters in embedding spaces and demonstrates that RL is superior to gradient-based search for non-convex feature selection problems.
Practical: It provides a viable solution for industries (healthcare, finance) where data silos and privacy laws prevent centralized data analysis. By aggregating knowledge rather than data, it enables collaborative model improvement without compromising compliance.
Scalability: The use of inducing points and efficient attention mechanisms makes the framework applicable to high-dimensional datasets where traditional exhaustive search or standard attention mechanisms are computationally prohibitive.

The code and data are publicly available, facilitating reproducibility and further research in federated feature selection.

Permutation-Invariant Representation Learning for Robust and Privacy-Preserving Feature Selection

1. The "Magic Recipe Book" (Permutation-Invariant Embedding)

2. The "Smart Explorer" (Policy-Guided Search)

3. The "Secret Club" (Federated Learning & Privacy)

Why is this a big deal?

1. Problem Statement

2. Methodology

A. Centralized Framework: CAPS

B. Federated Framework: FedCAPS

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Interpretable Tau-PET Synthesis from Multimodal T1-Weighted and FLAIR MRI Using Partial Information Decomposition Guided Disentangled Quantized Half-UNet

SUPERGLASSES: Benchmarking Vision Language Models as Intelligent Agents for AI Smart Glasses

MultiModalPFN: Extending Prior-Data Fitted Networks for Multimodal Tabular Learning

"Don't Do That!": Guiding Embodied Systems through Large Language Model-based Constraint Generation

OpenGLT: A Comprehensive Benchmark of Graph Neural Networks for Graph-Level Tasks